Module - IV
Schema Refinement
Yogendra Prasad P
Introduction
Schema Refinement:
• The Schema Refinement refers to refine the schema by using some
technique. The best technique of schema refinement is
decomposition.
• Normalization or Schema Refinement is a technique of organizing
the data in the database.
Yogendra Prasad P
• It is a systematic approach of decomposing tables to eliminate data
redundancy
Cont.
Data Redundancy:
The same amount of data is held in more than one case with
in a database is called data redundancy.
Problems caused by redundancy:
Storing the same information redundantly, that is, in more
than one place within a database can lead to several problems:
Yogendra Prasad P
• Redundant Storage: Some information is stored repeatedly.
• Update Anomalies: If one copy of such repeated data is updated,
an inconsistency is created unless all copies are similarly updated.
Cont.
• Insertion Anomalies: It may not be possible to store certain
information unless some other information is stored.
• Deletion Anomalies: It may not be possible to delete certain
information without losing some other information.
SID SName Rating Daily_Wage Worked_Hours
1 PK 8 10 32
2 VK 8 10 41
3 Kalyan 7 5 30
Yogendra Prasad P
4 Viraaj 7 5 25
5 Avinash 8 10 35
Table: Instance of an Hourly_Emp Relation
Decomposition
• To avoid redundancy and problems due to redundancy, we
use refinement technique called Decomposition.
• It is a process of decomposing a larger relation into smaller
relations.
• Each of smaller relations contain subset of attributes of
original relation.
If there is no proper decomposition of relations then it may
Yogendra Prasad P
•
lead to problem like loss of information.
Cont.
• We can decompose Hourly_Emp relation into two relations.
1. Hourly_Emp1 (SID, SName, Rating, Worked_hours)
2. Wage (Rating, Daily_Wage)
• Here we can easily record the daily wage for any rating simply
by adding a tuple to daily wage, even if no employee with that
rating appears in the current instance of hourly employee.
Yogendra Prasad P
Cont.
Problems related to decomposition:
• Whenever we are going for decomposition, we must be careful
because decomposition a relational schema can create more
problems than it solves.
• Two important questions must be asked repeatedly
1. Do we need to decompose a relation?
2. What problem (if any) does given decomposition cause?
Yogendra Prasad P
Functional Dependencies
• Functional dependency is a relationship that exist when one
attribute uniquely determines another attribute.
• Functional dependency is a set of constraints between two
attributes in a relation.
• A functional dependency AB in a relation holds true if two
tuples having the same value of attribute A also have the
same value of attribute B
Yogendra Prasad P
Cont.
• If t1.X=t2.X then t1.Y=t2.Y where t1,t2 are tuples and X,Y are
attribues.
• Functional dependency are used to determine the keys in a relation.
B is functionally dependent on A
A B
Determinant Dependant
Yogendra Prasad P
Cont.
Reasoning about functional dependencies:
Types of functional dependencies:
1. Trivial functional dependency: If X Y is a functional
dependency where Y subset X, these type of FD’s called as trivial
functional dependency.
2. Non-trivial functional dependency: If X Y and Y is not
subset of X then it is called non-trivial functional dependency.
Yogendra Prasad P
3. Completely non-trivial functional dependency: If X Y
and X∩Y=Ф(null) then it is called completely non-trivial
functional dependency.
Cont.
Armstrong Axioms/ Inference rules:
Armstrong axioms defines the set of rules for reasoning about
functional dependencies and also to infer all the functional
dependencies on a relational database.
Various axioms rules or inference rules:
Primary axioms:
1. Reflexivity Rule:
Yogendra Prasad P
If X is a set of attributes then X X holds
If X is a set of attributes and Y is a subset of X then X Y holds.
Cont.
2. Augmentation Rule:
If X Y holds and Z is a set of attributes then XZ YZ holds.
3. Transitivity Rule:
If X Y holds and Y Z holds then X Z holds.
Secondary or Derived axioms:
1. Union Rule:
If X Y holds and X Z holds then X YZ holds.
2. Decomposition Rule:
Yogendra Prasad P
If X YZ holds then X Y and X Z holds.
3. Pseudo Transitivity Rule:
If X Y holds and XZ W holds then YZ W holds.
Attribute Closure
• Attribute closure of an attribute set can be defined as set of
attributes which can be functionally determined from it.
• By using attribute closure we can find the Super keys and
Candidate keys in a relation.
• Attribute closure for any attribute can be denoted by X
+ .
Note:
To find attribute closure of an attribute set:
Yogendra Prasad P
1. Add elements of attribute set to the result set.
2. Recursively add elements to the result set which can be functionally
determined from the elements of result set.
Cont.
Example 1: Consider the following list of FD’s and compute the
closure for AG.
R (A, B, C, G, H,I)
F= { A B, A C, CG H, CG I, B H}
Sol:
ITERATION CLOSURE USING
1 AG
2 AGB A B
Yogendra Prasad P
3 AGBC A C
4 ABCGH B H
5 ABCGHI CG I
Cont.
Example 2: Consider the following FD’s and calculate A+ with
R (A, B, C, D)
F= {A B, B C, AB D}
Example 3: Compute the closures for relational schema
R (A, B, C, D, E)
F= {A BC, CD E, B D, E A} and list the keys of R.
Yogendra Prasad P
Cont.
Example 4: Let R (A, B, C, D, E) is a relational schema with functional
dependencies
F= {AB C, DE B, CD E}
find the closure for
i. B
ii. E
iii. ACE and
iv. BD
Yogendra Prasad P
Example 5: Compute the closures for the set of FD’s
R (A, B, G, H, I)
F= {A B, G H, G I, B H} and list the keys of R.
Normalization
• Normalization is the process of organizing the data in the database.
• Normalization is used to minimize the redundancy from a relation
or set of relations.
• It is also used to eliminate the undesirable characteristics like
Insertion, Update and Deletion Anomalies.
• Normalization divides the larger table into the smaller table and
links them using relationship.
Yogendra Prasad P
• The normal form is used to reduce redundancy from the database
table.
Cont.
Types of Normal Forms:
1. First Normal Form
2. Second Normal Form
3. Third Normal Form
4. Boyce-Codd Normal Form
5. Fourth Normal Form
Yogendra Prasad P
6. Fifth Normal Form
Cont.
First Normal Form (1NF):
If the database is in 1NF, it should satisfies the
following conditions:
i. Contains only atomic values.
ii. There are no repeating groups.
Yogendra Prasad P
Cont.
Example: Consider the following relation
SID SName SAddress SMobile
8812121212
101 PK Tirupati
9900012222
102 VK Bangalore 8912312390
103 Viraaj Hyderabad 7778881212
104 Kalyan Tirupati 8881877752
9990000123
105 Avinash Bangalore
Yogendra Prasad P
8123450987
This relation is not in 1 NF because it is having non atomic values
Cont.
We re-arrange the relation as below, to convert it to First Normal Form.
SID SName SAddress SMobile
101 PK Tirupati 8812121212
101 PK Tirupati 9900012222
102 VK Bangalore 8912312390
103 Kalyan Hyderabad 7778881212
104 Viraaj Tirupati 8881877752
Yogendra Prasad P
105 Avinash Bangalore 9990000123
105 Avinash Bangalore 8123450987
Cont.
Second Normal Form (2NF):
If the database is in 2NF, it should satisfies the following
conditions:
i. Relation must be in 1NF.
ii. All non-key attributes are fully functional dependent on the
primary key.
An attribute that is not part of any candidate key is known as
Yogendra Prasad P
non-prime attribute.
Cont.
Example: Consider the following relation
Project_Nam
Std_ID Project_ID Std_Name
e
• Here in the Student_Project relation the prime key attributes are
Std_ID and Project_ID.
• According to the rule, non-key attributes, i.e. Std_Name and
Project_Name must be dependent upon both and not on any of the
prime key attribute individually.
Yogendra Prasad P
• But we find that Std_Name can be identified by Std_ID and
Project_Name can be identified by Project_ID independently.
Cont.
• This is called partial dependency, which is not allowed in 2NF.
• We broke the relation in two as depicted as below. So there exists no
partial dependency.
Student
Project_Nam
Std_ID Std_Name
e
Project
Yogendra Prasad P
Project_ID Project_Name
Cont.
Third Normal Form (3NF):
If the database is in 3NF, it should satisfy the following
conditions:
i. Relation must be in 2NF.
ii. Transitive functional dependency of non-prime attribute
on any super key should be removed.
Yogendra Prasad P
Cont.
Example: Consider the following relation
Std_ID Std_Name City Zip
• Here in the Student_Detail relation, Std_ID is the key and only
prime key attribute.
• We find that City can be identified by Std_ID as well as Zip itself.
• Neither Zip is a superkey nor is City a prime attribute.
Yogendra Prasad P
• Additionally, Std_ID → Zip → City, so there exists transitive
dependency.
Cont.
• To bring this relation into third normal form, we break the relation
into two relations as follows
Student_Inf
Std_ID Std_Name Zip
ZipCode
Yogendra Prasad P
Zip City
Cont.
Boyce-Codd Normal Form (BCNF):
• BCNF is the advance version of 3NF. It is stricter than 3NF.
• A relation is in BCNF if every functional dependency X → Y, X is
the super key of the table.
• For BCNF, the table should be in 3NF, and for every FD, LHS is
super key.
Yogendra Prasad P
Cont.
Example: Consider the following relation
Std_Colle Std_Secti Std_Dept
Std_ID Std_Dept
ge on no
In the above table Functional dependencies are as follows:
1. Std_ID → Std_College
2. Std_Dept→ {Std_Section, Std_Deptno}
Yogendra Prasad P
The table is not in BCNF because neither Std_ID nor
Std_Dept alone are super keys.
Cont.
• To convert the given table into BCNF, we decompose it into three
tables:
Student_College
Std_ID Std_College
Student_Section
Std_Dept Std_Section Std_Deptno
Yogendra Prasad P
Student_Dept
Std_ID Std_Dept
Properties of Decomposition
• Decomposition is the process of breaking down in parts or elements.
• It replaces a relation with a collection of smaller relations.
• It breaks the table into multiple tables in a database.
• It should always be lossless, because it confirms that the
information in the original relation can be accurately reconstructed
based on the decomposed relations.
• If there is no proper decomposition of the relation, then it may lead
Yogendra Prasad P
to problems like loss of information.
Cont.
Following are the properties of Decomposition:
1. Lossless Decomposition
2. Dependency Preservation
Lossless Decomposition:
• If the information is not lost from the relation that is decomposed,
then the decomposition will be lossless.
Yogendra Prasad P
• The relation is said to be lossless decomposition if natural joins of
all the decomposition give the original relation.
Cont.
Example: <Employee_Department> Table
Eid Ename Age City Salary Depti DeptNam
d e
E001 PK 21 Tirupati 20000 D001 Finance
E002 VK 32 Tirupati 30000 D002 Production
E003 Lokesh 25 Hyderabad 5000 D003 Sales
E004 Venkatesh 24 Hyderabad 4000 D004 Marketing
E005 Avinash 25 Bangalore 25000 D005 HR
Yogendra Prasad P
Decompose the above relation into two relations to check whether a
decomposition is lossless or lossy.
Cont.
Relation 1: <Employee> Table Relation 2: <Department> Table
Eid Ename Age City Salary Deptid Eid DeptName
E001 PK 21 Tirupati 20000 D001 E001 Finance
E002 VK 32 Tirupati 30000 D002 E002 Production
E003 Kalyan 25 Hyderabad 5000 D003 E003 Sales
D004 E004 Marketing
E004 Vijaaj 24 Hyderabad 4000 D005 E005 HR
E005 Avinash 25 Bangalore 25000
Yogendra Prasad P
Cont.
• So, the above decomposition is a Lossless Join Decomposition, because the
two relations contains one common field that is 'Eid' and therefore join is
possible.
• Now apply natural join on the decomposed relations.
Employee ⋈ Department:
Eid Ename Age City Salary Depti DeptNam
d e
E001 PK 21 Tirupati 20000 D001 Finance
E002 VK 32 Tirupati 30000 D002 Production
Yogendra Prasad P
E003 Kalyan 25 Hyderabad 5000 D003 Sales
E004 Viraaj 24 Hyderabad 4000 D004 Marketing
E005 Avinash 25 Bangalore 25000 D005 HR
Cont.
Dependency Preservation:
• Dependency is an important constraint on the database.
• Everydependency must be satisfied by at least one
decomposed table.
• If
a relation R is decomposed into relation R1 and R2, then
the dependencies of R either must be a part of R1 or R2 or
must be derivable from the combination of functional
Yogendra Prasad P
dependencies of R1 and R2.
Cont.
Example:
• Suppose there is a relation R (A, B, C, D) with functional dependency
set {A->BC}.
• The relational R is decomposed into R1(A,B) and R2(A,C,D) which is
dependency preserving because FD A->BC is a part of relation
R1(ABC).
Yogendra Prasad P
Cont.
Multivalued Dependency:
• Multivalued dependency occurs when two attributes in a table are
independent of each other but, both depend on a third attribute.
• A multivalued dependency consists of at least two attributes that
are dependent on a third attribute that's why it always requires at
least three attributes.
Yogendra Prasad P
Cont.
Example:
Suppose there is a bike manufacturer company which produces
two colors(white and black) of each model every year.
BIKE_MODEL MANUF_YEAR COLOR
M2011 2008 White
M2001 2008 Black
M3001 2013 White
M3001 2013 Black
M4006 2017 White
Yogendra Prasad P
M4006 2017 Black
Here columns COLOR and MANUF_YEAR are dependent on
BIKE_MODEL and independent of each other.
Cont.
• Here columns COLOR and MANUF_YEAR are dependent on
BIKE_MODEL and independent of each other.
• In this case, these two columns can be called as multivalued
dependent on BIKE_MODEL. The representation of these
dependencies is shown below:
BIKE_MODEL → → MANUF_YEAR
BIKE_MODEL → → COLOR
This can be read as "BIKE_MODEL multidetermined
Yogendra Prasad P
•
MANUF_YEAR" and "BIKE_MODEL multidetermined COLOR".
Cont.
Join Dependency:
• Join decomposition is a further generalization of Multivalued
dependencies.
• If the join of R1 and R2 over C is equal to relation R, then we can
say that a join dependency (JD) exists. Where R1 and R2 are the
decompositions R1(A, B, C) and R2(C, D) of a given relations R (A,
B, C, D).
• Alternatively, R1 and R2 are a lossless decomposition of R.
Yogendra Prasad P
Cont.
Fourth Normal Form (4NF):
If the database is in 4NF, it should satisfy the following
conditions:
i. Relation must be in BCNF.
ii. Relation must have no multi-valued dependency.
Yogendra Prasad P
Cont.
Example: <Student> Table
STU_ID COURSE HOBBY
21 ORACLE Dancing
21 PYTHON Singing
34 C Dancing
74 JAVA Cricket
59 C++ Hockey
Yogendra Prasad P
The given STUDENT table is in 3NF, but the COURSE and
HOBBY are two independent entity. Hence, there is no relationship
between COURSE and HOBBY.
Cont.
So to make the above table into 4NF, we can decompose it into two tables:
Relation 1: <Std_Course> Table Relation 2: <Std_Hobby> Table
STU_ID COURSE STU_ID HOBBY
21 ORACLE 21 Dancing
21 PYTHON 21 Singing
34 C 34 Dancing
74 JAVA 74 Cricket
Yogendra Prasad P
59 C++ 59 Hockey
Cont.
Fifth Normal Form (5NF):
• If the database is in 5NF, it should satisfy the following conditions:
i. Relation must be in 4NF.
ii. Relation should not contain any join dependency and joining
should be lossless.
• 5NF is satisfied when all the tables are broken into as many tables as
possible in order to avoid redundancy.
Yogendra Prasad P
• 5NF is also known as Project-join normal form (PJ/NF).