Slide 10- 1
Chapter 10
Functional Dependencies and
Normalization for Relational
Databases
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe
Informal Design Guidelines for Relational
Databases
◼ We first discuss informal guidelines for good relational
design
◼ Then we discuss formal concepts of functional
dependencies and normal forms
◼ - 1NF (First Normal Form)
◼ - 2NF (Second Normal Form)
◼ - 3NF (Third Normal Form)
◼ - BCNF (Boyce-Codd Normal Form)
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide 10- 3
1.1 Semantics of the Relation Attributes
◼ GUIDELINE 1: Informally, each tuple in a relation should
represent one entity or relationship instance.
◼ Attributes of different entities (EMPLOYEEs,
DEPARTMENTs, PROJECTs) should not be mixed in the
same relation
◼ Only foreign keys should be used to refer to other entities
◼ Entity and relationship attributes should be kept apart as
much as possible
◼ Bottom Line: Design a schema that can be explained
easily relation by relation. The semantics of attributes
should be easy to interpret.
Slide 10- 4
A simplified COMPANY relational
database schema
Slide 10- 5
1.2 Redundant Information in Tuples and
Update Anomalies
◼ When Information is stored redundantly
◼ Wastes storage
◼ Causes problems with update anomalies
◼ Insertion anomalies
◼ Deletion anomalies
◼ Modification anomalies
Slide 10- 6
Figure 10.3 Two relation schemas
suffering from update anomalies
Slide 10- 7
Figure 10.4 Example States for
EMP_DEPT and EMP_PROJ
Slide 10- 8
Guideline to Redundant Information in
Tuples and Update Anomalies
◼ GUIDELINE 2:
◼ Design a schema that does not suffer from the
insertion, deletion and update anomalies.
◼ If there are any anomalies present, then note them
so that applications can be made to take them into
account.
Slide 10- 9
EXAMPLE OF AN UPDATE ANOMALY
◼ Consider the relation:
EMP_PROJ(Emp#, Proj#, Ename, Pname, No_hours)
◼ Update Anomaly:
◼ Changing the name of project number P1 from
“Billing” to “Customer-Accounting” may cause this
update to be made for all 100 employees working
on project P1.
Slide 10- 10
EXAMPLE OF AN INSERT ANOMALY
◼ Consider the relation:
EMP_PROJ(Emp#, Proj#, Ename, Pname, No_hours)
◼ Insert Anomaly:
◼ Cannot insert a project unless an employee is
assigned to it.
◼ Conversely
◼ Cannot insert an employee unless an he/she is
assigned to a project.
Slide 10- 11
EXAMPLE OF AN DELETE ANOMALY
◼ Consider the relation:
EMP_PROJ(Emp#, Proj#, Ename, Pname, No_hours)
◼ Delete Anomaly:
◼ When a project is deleted, it will result in deleting all
the employees who work on that project.
◼ Alternately, if an employee is the sole employee on
a project, deleting that employee would result in
deleting the corresponding project.
Slide 10- 12
1.3 Null Values in Tuples
◼ GUIDELINE 3:
◼ Relations should be designed such that their
tuples will have as few NULL values as possible
◼ Attributes that are NULL frequently could be
placed in separate relations (with the primary key)
◼ Reasons for nulls:
◼ Attribute not applicable or invalid
◼ Attribute value unknown (may exist)
◼ Value known to exist, but unavailable
Slide 10- 13
3 Normal Forms Based on Primary Keys
◼ 3.1 Normalization of Relations
◼ 3.2 Practical Use of Normal Forms
◼ 3.3 Definitions of Keys and Attributes
Participating in Keys
◼ 3.4 First Normal Form
◼ 3.5 Second Normal Form
◼ 3.6 Third Normal Form
Slide 10- 14
3.1 Normalization of Relations
◼ Normalization:
◼ The process of decomposing unsatisfactory
"bad" relations by breaking up their attributes
into smaller relations
◼ Normal form:
◼ Condition using keys and Functional
Dependencies (FDs) of a relation to certify
whether a relation schema is in a particular
normal form
Slide 10- 15
3.2 Practical Use of Normal Forms
◼ Normalization is carried out in practice so that the
resulting designs are of high quality and meet the
desirable properties
◼ The practical utility of these normal forms becomes
questionable when the constraints on which they are
based are hard to understand or to detect
◼ The database designers need not normalize to the
highest possible normal form
(usually up to 3NF, BCNF or 4NF)
◼ Denormalization:
◼ The process of storing the join of higher normal form
relations as a base relation—which is in a lower normal
form
Slide 10- 16
3.2 First Normal Form
◼ Disallows
◼ multivalued attributes
◼ Considered to be part of the definition of relation
Slide 10- 17
Figure 10.8 Normalization into 1NF
Slide 10- 18
Slide 10- 19
2.1 Functional Dependencies
◼ Functional Dependencies (FDs)
◼ Are derived from the real-world constraints on
the attributes
◼ Are used to specify formal measures of the
"goodness" of relational designs
◼ Are constraints that are derived from the meaning
and interrelationships of the data attributes
◼ A set of attributes X functionally determines a set
of attributes Y if the value of X determines a
unique value for Y
Slide 10- 20
Functional Dependency (FD)
◼ A functional dependency is a constraint between two
sets of attributes from the database.
◼ A functional dependency, denoted by X → Y:
There is a functional dependency from X to Y, or that Y is
functionally dependent on X
This means that the values of the Y component of a tuple in
r depend on the values of the X component;
alternatively, the values of the X component of a tuple
uniquely (or functionally) determine the values of the Y
component. Slide 10- 21
Examples on FD
◼ Social security number determines employee
name
◼ SSN -> ENAME
◼ Project number determines project name and
location
◼ PNUMBER -> {PNAME, PLOCATION}
◼ Employee SSN and project number determines
the hours per week that the employee works on
the project
◼ {SSN, PNUMBER} -> HOURS
Slide 10- 22
3.3 Second Normal Form
◼ Uses the concepts of FDs, primary key
◼ Definitions
◼ Prime attribute: An attribute that is member of the primary
key K
◼ Full functional dependency: a FD Y -> Z where removal
of any attribute from Y means the FD does not hold any
more
◼ Examples:
◼ {SSN, PNUMBER} -> HOURS is a full FD since neither SSN
-> HOURS nor PNUMBER -> HOURS hold
◼ {SSN, PNUMBER} -> ENAME is not a full FD (it is called a
partial dependency ) since SSN -> ENAME also holds
Slide 10- 23
Second Normal Form
◼ Examples:
◼ {SSN, PNUMBER} -> HOURS is a full FD since
neither SSN -> HOURS nor PNUMBER -> HOURS hold
◼ {SSN, PNUMBER} -> ENAME is not a full FD
(it is called a partial dependency )
since SSN -> ENAME also holds
Slide 10- 24
Second Normal Form (Cont.)
◼ A relation schema R is in second normal form (2NF)
if every non-prime attribute A in R is fully
functionally dependent on the primary key
◼ R can be decomposed into 2NF relations via the
process of 2NF Normalization.
Slide 10- 25
Figure 10.10 Normalizing into 2NF
Slide 10- 26
3.4 Third Normal Form
◼ Definition:
◼ Transitive functional dependency:
a FD X -> Z that can be derived from two FDs:
X -> Y and Y -> Z
◼ Examples:
◼ SSN -> DMGRSSN is a transitive FD
◼ Since SSN -> DNUMBER
and DNUMBER -> DMGRSSN hold
◼ SSN -> ENAME is non-transitive
◼ Since there is no set of attributes X
where SSN -> X and X -> ENAME
Slide 10- 27
Third Normal Form (2)
◼ A relation schema R is in third normal form (3NF) if it is
in 2NF and no non-prime attribute A in R is transitively
dependent on the primary key
◼ R can be decomposed into 3NF relations via the process
of 3NF normalization
◼ NOTE:
◼ In X -> Y and Y -> Z, with X as the primary key, we consider
this a problem only if Y is not a candidate key.
◼ When Y is a candidate key, there is no problem with the
transitive dependency .
◼ e.g., Consider EMP (SSN, Emp#, Salary)
Here, SSN -> Emp# -> Salary
and Emp# is a candidate key.
Slide 10- 28
EMP_DEPT Relation
Slide 10- 29
Figure 10.10 Normalizing into 3NF
Slide 10- 30
Normal Forms Defined Informally
◼ 1st normal form
◼ All attributes depend on the key
◼ 2nd normal form
◼ All attributes depend on the whole key
◼ 3rd normal form
◼ All attributes depend on nothing but the key
◼ Each normal form is strictly stronger than
the previous one
◼ Every 2NF relation is in 1NF
◼ Every 3NF relation is in 2NF
Slide 10- 31
Slide 10- 32
Slide 10- 33
Slide 10- 34
Summary
Slide 10- 35