[go: up one dir, main page]

0% found this document useful (0 votes)
9 views45 pages

Module-4 Schema Refinement

The document discusses schema refinement, focusing on normalization and decomposition as techniques to eliminate data redundancy in databases. It highlights the problems caused by redundancy, such as update, insertion, and deletion anomalies, and explains the importance of functional dependencies and various normal forms. Additionally, it covers the properties of decomposition, emphasizing lossless decomposition and dependency preservation.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views45 pages

Module-4 Schema Refinement

The document discusses schema refinement, focusing on normalization and decomposition as techniques to eliminate data redundancy in databases. It highlights the problems caused by redundancy, such as update, insertion, and deletion anomalies, and explains the importance of functional dependencies and various normal forms. Additionally, it covers the properties of decomposition, emphasizing lossless decomposition and dependency preservation.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 45

Module - IV

Schema Refinement

Yogendra Prasad P
Introduction
Schema Refinement:
• The Schema Refinement refers to refine the schema by using some
technique. The best technique of schema refinement is
decomposition.
• Normalization or Schema Refinement is a technique of organizing
the data in the database.

Yogendra Prasad P
• It is a systematic approach of decomposing tables to eliminate data
redundancy
Cont.
Data Redundancy:
The same amount of data is held in more than one case with
in a database is called data redundancy.

Problems caused by redundancy:


Storing the same information redundantly, that is, in more
than one place within a database can lead to several problems:

Yogendra Prasad P
• Redundant Storage: Some information is stored repeatedly.
• Update Anomalies: If one copy of such repeated data is updated,
an inconsistency is created unless all copies are similarly updated.
Cont.
• Insertion Anomalies: It may not be possible to store certain
information unless some other information is stored.
• Deletion Anomalies: It may not be possible to delete certain
information without losing some other information.

SID SName Rating Daily_Wage Worked_Hours


1 PK 8 10 32
2 VK 8 10 41
3 Kalyan 7 5 30

Yogendra Prasad P
4 Viraaj 7 5 25
5 Avinash 8 10 35

Table: Instance of an Hourly_Emp Relation


Decomposition
• To avoid redundancy and problems due to redundancy, we
use refinement technique called Decomposition.
• It is a process of decomposing a larger relation into smaller
relations.
• Each of smaller relations contain subset of attributes of
original relation.
If there is no proper decomposition of relations then it may

Yogendra Prasad P

lead to problem like loss of information.
Cont.
• We can decompose Hourly_Emp relation into two relations.
1. Hourly_Emp1 (SID, SName, Rating, Worked_hours)
2. Wage (Rating, Daily_Wage)

• Here we can easily record the daily wage for any rating simply
by adding a tuple to daily wage, even if no employee with that
rating appears in the current instance of hourly employee.

Yogendra Prasad P
Cont.
Problems related to decomposition:
• Whenever we are going for decomposition, we must be careful
because decomposition a relational schema can create more
problems than it solves.
• Two important questions must be asked repeatedly
1. Do we need to decompose a relation?
2. What problem (if any) does given decomposition cause?

Yogendra Prasad P
Functional Dependencies

• Functional dependency is a relationship that exist when one


attribute uniquely determines another attribute.
• Functional dependency is a set of constraints between two
attributes in a relation.
• A functional dependency AB in a relation holds true if two
tuples having the same value of attribute A also have the
same value of attribute B

Yogendra Prasad P
Cont.
• If t1.X=t2.X then t1.Y=t2.Y where t1,t2 are tuples and X,Y are
attribues.
• Functional dependency are used to determine the keys in a relation.

B is functionally dependent on A
A B
Determinant Dependant

Yogendra Prasad P
Cont.
Reasoning about functional dependencies:
Types of functional dependencies:

1. Trivial functional dependency: If X  Y is a functional


dependency where Y subset X, these type of FD’s called as trivial
functional dependency.
2. Non-trivial functional dependency: If X  Y and Y is not
subset of X then it is called non-trivial functional dependency.

Yogendra Prasad P
3. Completely non-trivial functional dependency: If X  Y
and X∩Y=Ф(null) then it is called completely non-trivial
functional dependency.
Cont.
Armstrong Axioms/ Inference rules:
Armstrong axioms defines the set of rules for reasoning about
functional dependencies and also to infer all the functional
dependencies on a relational database.

Various axioms rules or inference rules:


Primary axioms:
1. Reflexivity Rule:

Yogendra Prasad P
 If X is a set of attributes then X  X holds
 If X is a set of attributes and Y is a subset of X then X  Y holds.
Cont.
2. Augmentation Rule:
If X  Y holds and Z is a set of attributes then XZ  YZ holds.
3. Transitivity Rule:
If X  Y holds and Y  Z holds then X  Z holds.
Secondary or Derived axioms:
1. Union Rule:
If X  Y holds and X  Z holds then X  YZ holds.
2. Decomposition Rule:

Yogendra Prasad P
If X  YZ holds then X  Y and X  Z holds.
3. Pseudo Transitivity Rule:
If X  Y holds and XZ  W holds then YZ  W holds.
Attribute Closure
• Attribute closure of an attribute set can be defined as set of
attributes which can be functionally determined from it.
• By using attribute closure we can find the Super keys and
Candidate keys in a relation.

• Attribute closure for any attribute can be denoted by X


+ .
Note:
To find attribute closure of an attribute set:

Yogendra Prasad P
1. Add elements of attribute set to the result set.
2. Recursively add elements to the result set which can be functionally
determined from the elements of result set.
Cont.
Example 1: Consider the following list of FD’s and compute the
closure for AG.
R (A, B, C, G, H,I)
F= { A B, A C, CG H, CG I, B H}
Sol:
ITERATION CLOSURE USING
1 AG
2 AGB A B

Yogendra Prasad P
3 AGBC A C
4 ABCGH B H
5 ABCGHI CG I
Cont.
Example 2: Consider the following FD’s and calculate A+ with

R (A, B, C, D)

F= {A B, B C, AB D}

Example 3: Compute the closures for relational schema

R (A, B, C, D, E)

F= {A BC, CD  E, B  D, E  A} and list the keys of R.

Yogendra Prasad P
Cont.
Example 4: Let R (A, B, C, D, E) is a relational schema with functional
dependencies

F= {AB C, DE  B, CD E}

find the closure for


i. B
ii. E
iii. ACE and
iv. BD

Yogendra Prasad P
Example 5: Compute the closures for the set of FD’s

R (A, B, G, H, I)

F= {A B, G H, G I, B H} and list the keys of R.


Normalization
• Normalization is the process of organizing the data in the database.
• Normalization is used to minimize the redundancy from a relation
or set of relations.
• It is also used to eliminate the undesirable characteristics like
Insertion, Update and Deletion Anomalies.
• Normalization divides the larger table into the smaller table and
links them using relationship.

Yogendra Prasad P
• The normal form is used to reduce redundancy from the database
table.
Cont.
Types of Normal Forms:
1. First Normal Form
2. Second Normal Form
3. Third Normal Form
4. Boyce-Codd Normal Form
5. Fourth Normal Form

Yogendra Prasad P
6. Fifth Normal Form
Cont.
First Normal Form (1NF):
If the database is in 1NF, it should satisfies the
following conditions:
i. Contains only atomic values.
ii. There are no repeating groups.

Yogendra Prasad P
Cont.
Example: Consider the following relation
SID SName SAddress SMobile
8812121212
101 PK Tirupati
9900012222
102 VK Bangalore 8912312390
103 Viraaj Hyderabad 7778881212
104 Kalyan Tirupati 8881877752
9990000123
105 Avinash Bangalore

Yogendra Prasad P
8123450987

This relation is not in 1 NF because it is having non atomic values


Cont.
We re-arrange the relation as below, to convert it to First Normal Form.
SID SName SAddress SMobile

101 PK Tirupati 8812121212

101 PK Tirupati 9900012222

102 VK Bangalore 8912312390

103 Kalyan Hyderabad 7778881212

104 Viraaj Tirupati 8881877752

Yogendra Prasad P
105 Avinash Bangalore 9990000123

105 Avinash Bangalore 8123450987


Cont.
Second Normal Form (2NF):
If the database is in 2NF, it should satisfies the following
conditions:
i. Relation must be in 1NF.
ii. All non-key attributes are fully functional dependent on the
primary key.
An attribute that is not part of any candidate key is known as

Yogendra Prasad P
non-prime attribute.
Cont.
Example: Consider the following relation

Project_Nam
Std_ID Project_ID Std_Name
e

• Here in the Student_Project relation the prime key attributes are


Std_ID and Project_ID.
• According to the rule, non-key attributes, i.e. Std_Name and
Project_Name must be dependent upon both and not on any of the
prime key attribute individually.

Yogendra Prasad P
• But we find that Std_Name can be identified by Std_ID and
Project_Name can be identified by Project_ID independently.
Cont.
• This is called partial dependency, which is not allowed in 2NF.
• We broke the relation in two as depicted as below. So there exists no
partial dependency.
Student
Project_Nam
Std_ID Std_Name
e
Project

Yogendra Prasad P
Project_ID Project_Name
Cont.
Third Normal Form (3NF):
If the database is in 3NF, it should satisfy the following
conditions:
i. Relation must be in 2NF.
ii. Transitive functional dependency of non-prime attribute
on any super key should be removed.

Yogendra Prasad P
Cont.
Example: Consider the following relation

Std_ID Std_Name City Zip

• Here in the Student_Detail relation, Std_ID is the key and only


prime key attribute.
• We find that City can be identified by Std_ID as well as Zip itself.
• Neither Zip is a superkey nor is City a prime attribute.

Yogendra Prasad P
• Additionally, Std_ID → Zip → City, so there exists transitive
dependency.
Cont.
• To bring this relation into third normal form, we break the relation
into two relations as follows

Student_Inf
Std_ID Std_Name Zip

ZipCode

Yogendra Prasad P
Zip City
Cont.

Boyce-Codd Normal Form (BCNF):


• BCNF is the advance version of 3NF. It is stricter than 3NF.
• A relation is in BCNF if every functional dependency X → Y, X is
the super key of the table.
• For BCNF, the table should be in 3NF, and for every FD, LHS is
super key.

Yogendra Prasad P
Cont.
Example: Consider the following relation

Std_Colle Std_Secti Std_Dept


Std_ID Std_Dept
ge on no

In the above table Functional dependencies are as follows:


1. Std_ID → Std_College
2. Std_Dept→ {Std_Section, Std_Deptno}

Yogendra Prasad P
The table is not in BCNF because neither Std_ID nor
Std_Dept alone are super keys.
Cont.
• To convert the given table into BCNF, we decompose it into three
tables:
Student_College

Std_ID Std_College

Student_Section
Std_Dept Std_Section Std_Deptno

Yogendra Prasad P
Student_Dept
Std_ID Std_Dept
Properties of Decomposition
• Decomposition is the process of breaking down in parts or elements.
• It replaces a relation with a collection of smaller relations.
• It breaks the table into multiple tables in a database.
• It should always be lossless, because it confirms that the
information in the original relation can be accurately reconstructed
based on the decomposed relations.
• If there is no proper decomposition of the relation, then it may lead

Yogendra Prasad P
to problems like loss of information.
Cont.
Following are the properties of Decomposition:
1. Lossless Decomposition
2. Dependency Preservation
Lossless Decomposition:
• If the information is not lost from the relation that is decomposed,
then the decomposition will be lossless.

Yogendra Prasad P
• The relation is said to be lossless decomposition if natural joins of
all the decomposition give the original relation.
Cont.
Example: <Employee_Department> Table

Eid Ename Age City Salary Depti DeptNam


d e
E001 PK 21 Tirupati 20000 D001 Finance
E002 VK 32 Tirupati 30000 D002 Production
E003 Lokesh 25 Hyderabad 5000 D003 Sales
E004 Venkatesh 24 Hyderabad 4000 D004 Marketing
E005 Avinash 25 Bangalore 25000 D005 HR

Yogendra Prasad P
Decompose the above relation into two relations to check whether a
decomposition is lossless or lossy.
Cont.
Relation 1: <Employee> Table Relation 2: <Department> Table
Eid Ename Age City Salary Deptid Eid DeptName

E001 PK 21 Tirupati 20000 D001 E001 Finance


E002 VK 32 Tirupati 30000 D002 E002 Production
E003 Kalyan 25 Hyderabad 5000 D003 E003 Sales
D004 E004 Marketing
E004 Vijaaj 24 Hyderabad 4000 D005 E005 HR

E005 Avinash 25 Bangalore 25000

Yogendra Prasad P
Cont.
• So, the above decomposition is a Lossless Join Decomposition, because the
two relations contains one common field that is 'Eid' and therefore join is
possible.

• Now apply natural join on the decomposed relations.


Employee ⋈ Department:
Eid Ename Age City Salary Depti DeptNam
d e
E001 PK 21 Tirupati 20000 D001 Finance
E002 VK 32 Tirupati 30000 D002 Production

Yogendra Prasad P
E003 Kalyan 25 Hyderabad 5000 D003 Sales
E004 Viraaj 24 Hyderabad 4000 D004 Marketing
E005 Avinash 25 Bangalore 25000 D005 HR
Cont.
Dependency Preservation:
• Dependency is an important constraint on the database.
• Everydependency must be satisfied by at least one
decomposed table.
• If
a relation R is decomposed into relation R1 and R2, then
the dependencies of R either must be a part of R1 or R2 or
must be derivable from the combination of functional

Yogendra Prasad P
dependencies of R1 and R2.
Cont.
Example:
• Suppose there is a relation R (A, B, C, D) with functional dependency
set {A->BC}.
• The relational R is decomposed into R1(A,B) and R2(A,C,D) which is
dependency preserving because FD A->BC is a part of relation
R1(ABC).

Yogendra Prasad P
Cont.
Multivalued Dependency:
• Multivalued dependency occurs when two attributes in a table are
independent of each other but, both depend on a third attribute.
• A multivalued dependency consists of at least two attributes that
are dependent on a third attribute that's why it always requires at
least three attributes.

Yogendra Prasad P
Cont.
Example:
Suppose there is a bike manufacturer company which produces
two colors(white and black) of each model every year.
BIKE_MODEL MANUF_YEAR COLOR
M2011 2008 White
M2001 2008 Black
M3001 2013 White
M3001 2013 Black
M4006 2017 White

Yogendra Prasad P
M4006 2017 Black

Here columns COLOR and MANUF_YEAR are dependent on


BIKE_MODEL and independent of each other.
Cont.
• Here columns COLOR and MANUF_YEAR are dependent on
BIKE_MODEL and independent of each other.
• In this case, these two columns can be called as multivalued
dependent on BIKE_MODEL. The representation of these
dependencies is shown below:
BIKE_MODEL → → MANUF_YEAR
BIKE_MODEL → → COLOR
This can be read as "BIKE_MODEL multidetermined

Yogendra Prasad P

MANUF_YEAR" and "BIKE_MODEL multidetermined COLOR".
Cont.
Join Dependency:
• Join decomposition is a further generalization of Multivalued
dependencies.
• If the join of R1 and R2 over C is equal to relation R, then we can
say that a join dependency (JD) exists. Where R1 and R2 are the
decompositions R1(A, B, C) and R2(C, D) of a given relations R (A,
B, C, D).
• Alternatively, R1 and R2 are a lossless decomposition of R.

Yogendra Prasad P
Cont.
Fourth Normal Form (4NF):
If the database is in 4NF, it should satisfy the following
conditions:
i. Relation must be in BCNF.
ii. Relation must have no multi-valued dependency.

Yogendra Prasad P
Cont.
Example: <Student> Table

STU_ID COURSE HOBBY


21 ORACLE Dancing
21 PYTHON Singing
34 C Dancing
74 JAVA Cricket
59 C++ Hockey

Yogendra Prasad P
The given STUDENT table is in 3NF, but the COURSE and
HOBBY are two independent entity. Hence, there is no relationship
between COURSE and HOBBY.
Cont.
So to make the above table into 4NF, we can decompose it into two tables:
Relation 1: <Std_Course> Table Relation 2: <Std_Hobby> Table

STU_ID COURSE STU_ID HOBBY


21 ORACLE 21 Dancing
21 PYTHON 21 Singing
34 C 34 Dancing
74 JAVA 74 Cricket

Yogendra Prasad P
59 C++ 59 Hockey
Cont.
Fifth Normal Form (5NF):
• If the database is in 5NF, it should satisfy the following conditions:
i. Relation must be in 4NF.
ii. Relation should not contain any join dependency and joining
should be lossless.
• 5NF is satisfied when all the tables are broken into as many tables as
possible in order to avoid redundancy.

Yogendra Prasad P
• 5NF is also known as Project-join normal form (PJ/NF).

You might also like