[go: up one dir, main page]

0% found this document useful (0 votes)
7 views67 pages

Unit 3 Normalization

Uploaded by

manjula
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views67 pages

Unit 3 Normalization

Uploaded by

manjula
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 67

Chapter 10

Functional Dependencies and


Normalization for Relational
Databases

Copyright © 2004 Pearson Education, Inc.


Chapter Outline
1 Informal Design Guidelines for Relational Databases
1.1Semantics of the Relation Attributes
1.2 Redundant Information in Tuples and Update Anomalies
1.3 Null Values in Tuples
1.4 Spurious Tuples
2 Functional Dependencies (FDs)
2.1 Definition of FD
2.2 Inference Rules for FDs
2.3 Equivalence of Sets of FDs
2.4 Minimal Sets of FDs

Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 10-3


Copyright © 2004 Ramez Elmasri and Shamkant Navathe
Chapter Outline(contd.)
3 Normal Forms Based on Primary Keys
3.1 Normalization of Relations
3.2 Practical Use of Normal Forms
3.3 Definitions of Keys and Attributes Participating in Keys
3.4 First Normal Form
3.5 Second Normal Form
3.6 Third Normal Form
4 General Normal Form Definitions (For Multiple
Keys)
5 BCNF (Boyce-Codd Normal Form)
Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 10-4
Copyright © 2004 Ramez Elmasri and Shamkant Navathe
1 Informal Design Guidelines for
Relational Databases (1)

What is relational database design?


The grouping of attributes to form "good" relation schemas
 Two levels of relation schemas
– Logical (or conceptual) level - how users interpret the
relation schemas and the meaning of their attributes.
– Implementation (or physical storage) level - how the
tuples in a base relation are stored and updated
 Design is concerned mainly with base relations
 What are the criteria for "good" base relations?

Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 10-5


Copyright © 2004 Ramez Elmasri and Shamkant Navathe
Informal Design Guidelines for
Relational Databases (2)
We first discuss informal guidelines for good
relational design
Then we discuss formal concepts of functional
dependencies and normal forms
- 1NF (First Normal Form)
- 2NF (Second Normal Form)
- 3NF (Third Normal Form)
- BCNF (Boyce-Codd Normal Form)
Additional types of dependencies, further normal
forms, relational design algorithms by synthesis are
discussed in Chapter 11
Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 10-6
Copyright © 2004 Ramez Elmasri and Shamkant Navathe
Informal Design Guidelines for
Relational Databases (3)

Four informal guidelines that may be used as measures to


determine the quality of relation schema design:
■ Making sure that the semantics of the attributes is clear
in the schema
■ Reducing the redundant information in tuples
■ Reducing the NULL values in tuples
■ Disallowing the possibility of generating spurious tuples

Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 10-7


Copyright © 2004 Ramez Elmasri and Shamkant Navathe
Imparting Clear Semantics to
Attributes in Relations
Whenever we group attributes to form a relation
schema, we assume that attributes belonging to one
relation have certain real-world meaning and a
proper interpretation associated with them. The
semantics of a relation refers to its meaning resulting
from the interpretation of attribute values in a tuple.

Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 10-8


Copyright © 2004 Ramez Elmasri and Shamkant Navathe
A simplified COMPANY relational database schema

Note: The above figure is now called Figure 10.1 in Edition 4


Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 10-9
Copyright © 2004 Ramez Elmasri and Shamkant Navathe
Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 10-10
Copyright © 2004 Ramez Elmasri and Shamkant Navathe
Guideline 1. Design a relation schema so that it is easy to
explain its meaning. Do not combine attributes from
multiple entity types and relationship types into a single
relation. Intuitively, if a relation schema corresponds to
one entity type or one relationship type, it is
straightforward to explain its meaning. Otherwise, if the
relation corresponds to a mixture of multiple entities and
relationships, semantic ambiguities will result and the
relation cannot be easily explained.

Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 10-11


Copyright © 2004 Ramez Elmasri and Shamkant Navathe
1.1 Semantics of the Relation Attributes
GUIDELINE 1: Informally, each tuple in a relation
should represent one entity or relationship instance.
(Applies to individual relations and their attributes).
 Attributes of different entities (EMPLOYEEs, DEPARTMENTs,
PROJECTs) should not be mixed in the same relation
 Only foreign keys should be used to refer to other entities
 Entity and relationship attributes should be kept apart as much as
possible.
Bottom Line: Design a schema that can be explained
easily relation by relation. The semantics of
attributes should be easy to interpret.

Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 10-12


Copyright © 2004 Ramez Elmasri and Shamkant Navathe
Exs of violating guideline 1

Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 10-13


Copyright © 2004 Ramez Elmasri and Shamkant Navathe
1.2 Redundant Information in
Tuples and Update Anomalies
One goal of schema design  to minimize the storage space used by the base
relations
For example, compare the space used by the two base relations EMPLOYEE and
DEPARTMENT with that for an EMP_DEPT base relation which is the result of
applying the NATURAL JOIN operation to EMPLOYEE and DEPARTMENT. In
EMP_DEPT, the attribute values pertaining to a particular department (Dnumber,
Dname, Dmgr_ssn) are repeated for every employee who works for that
department. In contrast, each department’s information appears only once in the
DEPARTMENT relation. Only the department number (Dnumber) is repeated in
the EMPLOYEE relation for each employee who works in that department as a
foreign key. Similar comments apply to the EMP_PROJ relation, which augments
the WORKS_ON relation with additional attributes from EMPLOYEE and
PROJECT

Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 10-14


Copyright © 2004 Ramez Elmasri and Shamkant Navathe
1.2 Redundant Information in Tuples and
Update Anomalies
Mixing attributes of multiple entities may cause problems
Information is stored redundantly wasting storage
Storing natural joins of base relations leads to an additional
problem referred to as UPDATE ANOMALIES.
These can be classified into:
– Insertion anomalies
– Deletion anomalies
– Modification anomalies

Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 10-15


Copyright © 2004 Ramez Elmasri and Shamkant Navathe
Insertion Anomalies
Insertion anomalies can be differentiated into two types:
Ex - EMP_DEPT relation:
■ To insert a new employee tuple into EMP_DEPT, we must
include either the attribute values for the department that the
employee works for, or NULLs (if the employee does not work for
a department as yet).
■ It is difficult to insert a new department that has no employees as
yet in the EMP_DEPT relation. The only way to do this is to place
NULL values in the attributes for employee. This violates the entity
integrity for EMP_DEPT because its primary key Ssn cannot be
null. Moreover, when the first employee is assigned to that
department, we do not need this tuple with NULL values anymore.

Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 10-16


Copyright © 2004 Ramez Elmasri and Shamkant Navathe
Deletion and Modification Anomalies
Deletion Anomalies: If we delete from EMP_DEPT an employee tuple
that happens to represent the last employee working for a particular
department, the information concerning that department is lost
inadvertently from the database. This problem does not occur in the
database of Figure 14.2 because DEPARTMENT tuples are stored
separately

Modification Anomalies: In EMP_DEPT, if we change the value of


one of the attributes of a particular department—say, the manager of
department 5—we must update the tuples of all employees who work in
that department; otherwise, the database will become inconsistent. If we
fail to update some tuples, the same department will be shown to have
two different values for manager in different employee tuples, which
would be wrong.

Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 10-17


Copyright © 2004 Ramez Elmasri and Shamkant Navathe
EXAMPLE OF AN UPDATE
ANOMALY (1)
Consider the relation:
EMP_PROJ ( Emp#, Proj#, Ename, Pname, No_hours)

Update Anomaly: Changing the name of project


number P1 from “Billing” to “Customer-
Accounting” may cause this update to be made for
all 100 employees working on project P1.

Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 10-18


Copyright © 2004 Ramez Elmasri and Shamkant Navathe
EXAMPLE OF AN UPDATE
ANOMALY (2)
Insert Anomaly: Cannot insert a project unless
an employee is assigned to .
Inversely - Cannot insert an employee unless an
he/she is assigned to a project.
 Delete Anomaly: When a project is deleted, it
will result in deleting all the employees who work
on that project. Alternately, if an employee is the
sole employee on a project, deleting that employee
would result in deleting the corresponding project.
Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 10-19
Copyright © 2004 Ramez Elmasri and Shamkant Navathe
Two relation schemas suffering from update anomalies

Note: The above figure is now called Figure 10.3 in Edition 4


Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 10-20
Copyright © 2004 Ramez Elmasri and Shamkant Navathe
Example States for EMP_DEPT and EMP_PROJ

Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 10-21


Copyright © 2004 Ramez Elmasri and Shamkant Navathe
Guideline to Redundant Information
in Tuples and Update Anomalies
GUIDELINE 2: Design a schema that does not
suffer from the insertion, deletion and update
anomalies. If there are any present, then note them
so that applications can be made to take them into
account

Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 10-22


Copyright © 2004 Ramez Elmasri and Shamkant Navathe
1.3 Null Values in Tuples

Reasons for nulls:


 attribute not applicable or invalid
 attribute value unknown (may exist)
 value known to exist, but unavailable

Problems with NULLs:


 Storage space
 Understanding the meaning
 How to account for them during aggregate functions?

Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 10-23


Copyright © 2004 Ramez Elmasri and Shamkant Navathe
1.3 Null Values in Tuples

GUIDELINE 3: Avoid placing attributes in a


base relation whose values may frequently be
NULL. Relations should be designed such that their
tuples will have as few NULL values as possible

 Attributes that are NULL frequently could be


placed in separate relations (with the primary key)

Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 10-24


Copyright © 2004 Ramez Elmasri and Shamkant Navathe
Ex – No Spurious Tuples

Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 10-25


Copyright © 2004 Ramez Elmasri and Shamkant Navathe
Ex – Spurious Tuples

Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 10-26


Copyright © 2004 Ramez Elmasri and Shamkant Navathe
1.4 Spurious Tuples

GUIDELINE 4:

Design relation schemas so that they can be joined with equality


conditions on attributes that are appropriately related (primary
key, foreign key) pairs in a way that guarantees that no spurious
tuples are generated. Avoid relations that contain matching
attributes that are not (foreign key, primary key) combinations
because joining on such attributes may produce spurious tuples.

Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 10-27


Copyright © 2004 Ramez Elmasri and Shamkant Navathe
2.1 Functional Dependencies (1)
Functional dependencies (FDs) are used as formal measures for
"goodness" of relational database design

FDs and keys are used to define normal forms for relations

A functional dependency is a constraint between two sets of


attributes from the database. A set of attributes X functionally
determines a set of attributes Y if the value of X determines a
unique value for Y. XY

FDs are constraints that are derived from the meaning and
interrelationships of the data attributes

Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 10-28


Copyright © 2004 Ramez Elmasri and Shamkant Navathe
Functional Dependencies (2)

 X -> Y holds if whenever two tuples have the same value


for X, they must have the same value for Y. For any two
tuples t1 and t2 in any relation instance r(R): If
t1[X]=t2[X], then t1[Y]=t2[Y]

 X -> Y in R specifies a constraint on all relation instances


r(R) since it is a property of the relation schema R.

 FDs are derived from the real-world constraints on the


attributes

Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 10-29


Copyright © 2004 Ramez Elmasri and Shamkant Navathe
Examples of FD constraints (1)

social security number determines employee name


SSN -> ENAME
project number determines project name and
location
PNUMBER -> {PNAME, PLOCATION}
employee ssn and project number determines the
hours per week that the employee works on the
project
{SSN, PNUMBER} -> HOURS

Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 10-30


Copyright © 2004 Ramez Elmasri and Shamkant Navathe
Examples of FD constraints (2)

An FD is a property of the attributes in the schema R


The constraint must hold on every relation instance
r(R)
If K is a key of R, then K functionally determines all
attributes in R (since we never have two distinct
tuples with t1[K]=t2[K])

Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 10-31


Copyright © 2004 Ramez Elmasri and Shamkant Navathe
2.2 Inference Rules for FDs (1)
Given a set of FDs F, we can infer additional FDs that
hold whenever the FDs in F hold

Armstrong's inference rules:


IR1. (Reflexive) If Y subset-of X, then X -> Y
IR2. (Augmentation) If X -> Y, then XZ -> YZ
(Notation: XZ stands for X U Z)
IR3. (Transitive) If X -> Y and Y -> Z, then X -> Z

 IR1, IR2, IR3 form a sound and complete set of


inference rules

Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 10-32


Copyright © 2004 Ramez Elmasri and Shamkant Navathe
Inference Rules for FDs (2)

Some additional inference rules that are useful:


(Decomposition) If X -> YZ, then X -> Y and X -> Z
(Union) If X -> Y and X -> Z, then X -> YZ
(Psuedotransitivity) If X -> Y and WY -> Z, then WX ->
Z

 The last three inference rules, as well as any other


inference rules, can be deduced from IR1, IR2,
and IR3 (completeness property)

Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 10-33


Copyright © 2004 Ramez Elmasri and Shamkant Navathe
Inference Rules for FDs (3)

Closure of a set F of FDs is the set F+ of all FDs


that can be inferred from F

Closure of a set of attributes X with respect to F is


the set X + of all attributes that are functionally
determined by X

X + can be calculated by repeatedly applying IR1,


IR2, IR3 using the FDs in F

Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 10-34


Copyright © 2004 Ramez Elmasri and Shamkant Navathe
2.3 Equivalence of Sets of FDs

Two sets of FDs F and G are equivalent if:


- every FD in F can be inferred from G, and
- every FD in G can be inferred from F
Hence, F and G are equivalent if F + =G +
Definition: F covers G if every FD in G can be
inferred from F (i.e., if G + subset-of F +)
F and G are equivalent if F covers G and G covers
F
There is an algorithm for checking equivalence of
sets of FDs
Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 10-35
Copyright © 2004 Ramez Elmasri and Shamkant Navathe
2.4 Minimal Sets of FDs (1)

 A set of FDs is minimal if it satisfies the


following conditions:
(1) Every dependency in F has a single attribute for its RHS.
(2) We cannot remove any dependency from F and have a
set of dependencies that is equivalent to F.
(3) We cannot replace any dependency X -> A in F with a
dependency Y -> A, where Y proper-subset-of X
( Y subset-of X) and still have a set of dependencies that
is equivalent to F.

Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 10-36


Copyright © 2004 Ramez Elmasri and Shamkant Navathe
Minimal Sets of FDs (2)

Every set of FDs has an equivalent minimal set


There can be several equivalent minimal sets
There is no simple algorithm for computing a
minimal set of FDs that is equivalent to a set F of
FDs
To synthesize a set of relations, we assume that
we start with a set of dependencies that is a
minimal set (e.g., see algorithms 11.2 and 11.4)

Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 10-37


Copyright © 2004 Ramez Elmasri and Shamkant Navathe
3 Normal Forms Based on Primary
Keys
3.1 Normalization of Relations
3.2 Practical Use of Normal Forms
3.3 Definitions of Keys and Attributes
Participating in Keys
3.4 First Normal Form
3.5 Second Normal Form
3.6 Third Normal Form

Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 10-38


Copyright © 2004 Ramez Elmasri and Shamkant Navathe
The normalization process, as first proposed by Codd (1972a), takes a relation
schema through a series of tests to certify whether it satisfies a certain normal
form. The process, which proceeds in a top-down fashion by evaluating each
relation against the criteria for normal forms and decomposing relations as
necessary, can thus be considered as relational design by analysis. Initially,
Codd proposed three normal forms, which he called first, second, and third
normal form. A stronger definition of 3NF—called Boyce-Codd normal form
(BCNF)—was proposed later by Boyce and Codd. All these normal forms are
based on a single analytical tool: the functional dependencies among the
attributes of a relation. Later, a fourth normal form (4NF) and a fifth normal
form (5NF) were proposed, based on the concepts of multivalued dependencies
and join dependencies, respectively. 3NF relations may be generated from a
given set of FDs using relational design by synthesis.

Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 10-39


Copyright © 2004 Ramez Elmasri and Shamkant Navathe
Normalization of data can be considered a process of
analyzing the given relation schemas based on their
FDs and primary keys to achieve the desirable
properties of
(1) minimizing redundancy and
(2) minimizing the insertion, deletion, and update
anomalies

The normal form of a relation refers to the highest


normal form condition that it meets, and hence
indicates the degree to which it has been normalized.

Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 10-40


Copyright © 2004 Ramez Elmasri and Shamkant Navathe
3.1 Normalization of Relations (1)
Normalization: The process of decomposing
unsatisfactory "bad" relations by breaking up their
attributes into smaller relations

Normal form: Condition using keys and FDs of a


relation to certify whether a relation schema is in a
particular normal form

Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 10-41


Copyright © 2004 Ramez Elmasri and Shamkant Navathe
Normalization of Relations (2)

2NF, 3NF, BCNF based on keys and FDs of a


relation schema
4NF based on keys, multi-valued dependencies :
MVDs; 5NF based on keys, join dependencies :
JDs (Chapter 11)
Additional properties may be needed to ensure a
good relational design (lossless join, dependency
preservation;)

Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 10-42


Copyright © 2004 Ramez Elmasri and Shamkant Navathe
Properties of decompositions

There are two important properties of decompositions:


(a) Lossless join or non-additive join – guarantees that
spurious tuples are not generated
(b) Dependency preservation – preservation of the
functional dependencies

Note that property (a) is extremely important and cannot


be sacrificed. Property (b) is less stringent and may
be sacrificed.

Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 10-43


Copyright © 2004 Ramez Elmasri and Shamkant Navathe
3.2 Practical Use of Normal Forms
 Normalization is carried out in practice so that the resulting designs are of
high quality and meet the desirable properties.

 Although higher normal forms (4NF and 5NF) are defined, the practical
utility of these normal forms becomes questionable when the constraints on
which they are based are hard to understand or to detect.  upto
3NF/BCNF/4NF

 The database designers need not normalize to the highest possible normal
form. (usually up to 3NF, BCNF or 4NF). Relations maybe left in a lower
normal form such as 2NF, for performance reasons.

 Denormalization: the process of storing the join of higher normal form


relations as a base relation—which is in a lower normal form

Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 10-44


Copyright © 2004 Ramez Elmasri and Shamkant Navathe
3.3 Definitions of Keys and Attributes
Participating in Keys (1)
A superkey of a relation schema R = {A1, A2, ....,
An} is a set of attributes S subset-of R with the
property that no two tuples t1 and t2 in any legal
relation state r of R will have t1[S] = t2[S]

A key K is a superkey with the additional


property that removal of any attribute from K will
cause K not to be a superkey any more.

Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 10-45


Copyright © 2004 Ramez Elmasri and Shamkant Navathe
Definitions of Keys and Attributes
Participating in Keys (2)
If a relation schema has more than one key, each
is called a candidate key. One of the candidate
keys is arbitrarily designated to be the primary
key, and the others are called secondary keys.
A Prime attribute must be a member of some
candidate key
A Nonprime attribute is not a prime attribute—
that is, it is not a member of any candidate key.

Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 10-46


Copyright © 2004 Ramez Elmasri and Shamkant Navathe
3.2 First Normal Form

Disallows composite attributes, multivalued


attributes, and nested relations; attributes
whose values for an individual tuple are
non-atomic.

The only attribute values permitted by 1NF


are single atomic (or indivisible) values.

Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 10-47


Copyright © 2004 Ramez Elmasri and Shamkant Navathe
Normalization into 1NF

Note: The above figure is now called Figure 10.8 in Edition 4


Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 10-48
Copyright © 2004 Ramez Elmasri and Shamkant Navathe
3.2 First Normal Form (1NF)
Approaches to achieve 1NF:
1)Remove the attribute (A) that violates 1NF and place it in a
separate relation along with the primary (P) of the relation. The
primary key of the new relation will be (P, A).
2)Expand the key (P) so that there will be a separate tuple in the
original relation. The primary key will now be (P, A)
Disadvantage – introduces redundancy
3)If max no of values is known for the attribute (For ex-max 3
values), replace the attribute A by three atomic attributes: A1, A2
and A3.
Disadvantage – NULL values if values are less than max

Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 10-49


Copyright © 2004 Ramez Elmasri and Shamkant Navathe
Normalization nested relations into 1NF

Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Ramez Elmasri and Shamkant Navathe
3.2 First Normal Form (1NF)
More than one multivalued attribute:
Ex for a non-1NF relation:
PERSON( SSN, {Car_Lic#}, {Phone#})

Approach 1:
PPERSON_IN_1NF(SSN, Car_Lic#, Phone#)

Approach 2:
P1(SSN, Car_Lic#) and P2(SSN, Phone#)

Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 10-51


Copyright © 2004 Ramez Elmasri and Shamkant Navathe
3.3 Second Normal Form (1)
 Uses the concepts of FDs, primary key

Definitions:
Prime attribute - attribute that is member of the
key K
Full functional dependency - a FD Y -> Z
where removal of any attribute from Y means the
FD does not hold any more
Examples: - {SSN, PNUMBER} -> HOURS is a full FD
since neither SSN -> HOURS nor PNUMBER -> HOURS hold

Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 10-52


Copyright © 2004 Ramez Elmasri and Shamkant Navathe
3.3 Second Normal Form (2)
Partial dependency - a FD Y -> Z is a partial
dependency if some attribute of Y can be removed
and the FD still holds.
Example
- {SSN, PNUMBER} -> ENAME is not a full FD (it is called
a partial dependency ) since SSN -> ENAME also holds

The test for 2NF involves testing for functional dependencies


whose left-hand side attributes are part of the primary key.
If the primary key contains a single attribute, the test need
not be applied at all.

Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 10-53


Copyright © 2004 Ramez Elmasri and Shamkant Navathe
Second Normal Form (3)

A relation schema R is in second normal


form (2NF) if every non-prime attribute A
in R is fully functionally dependent on the
primary key

R can be decomposed into 2NF relations


via the process of 2NF normalization

Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 10-54


Copyright © 2004 Ramez Elmasri and Shamkant Navathe
2NF Normalization

Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 10-55


Copyright © 2004 Ramez Elmasri and Shamkant Navathe
3.4 Third Normal Form (1)

Definition:
Transitive functional dependency - a FD X -> Z
that can be derived from two FDs X -> Y and Y -> Z
Examples:
SSN -> DMGRSSN is a transitive FD since
SSN -> DNUMBER and DNUMBER -> DMGRSSN hold

SSN -> ENAME is non-transitive since there is no set of


attributes X where SSN -> X and X -> ENAME

Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 10-56


Copyright © 2004 Ramez Elmasri and Shamkant Navathe
Third Normal Form (2)

A relation schema R is in third normal form


(3NF) if it is in 2NF and no non-prime attribute A
in R is transitively dependent on the primary key

R can be decomposed into 3NF relations via the


process of 3NF normalization

Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 10-57


Copyright © 2004 Ramez Elmasri and Shamkant Navathe
3NF Normalization

Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 10-58


Copyright © 2004 Ramez Elmasri and Shamkant Navathe
4 General Normal Form Definitions
(For Multiple Keys) (1)
The above definitions consider the primary key
only
The following more general definitions take into
account relations with multiple candidate keys
A relation schema R is in second normal form
(2NF) if every non-prime attribute A in R is fully
functionally dependent on every key of R

Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 10-59


Copyright © 2004 Ramez Elmasri and Shamkant Navathe
General Normal Form Definitions (2)

Definition:
Superkey of relation schema R - a set of attributes
S of R that contains a key of R
A relation schema R is in third normal form
(3NF) if whenever a FD X -> A holds in R, then
either:
(a) X is a superkey of R, or
(b) A is a prime attribute of R
NOTE: Boyce-Codd normal form disallows condition (b)
above
Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 10-60
Copyright © 2004 Ramez Elmasri and Shamkant Navathe
Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 10-61
Copyright © 2004 Ramez Elmasri and Shamkant Navathe
Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 10-62
Copyright © 2004 Ramez Elmasri and Shamkant Navathe
5 BCNF (Boyce-Codd Normal Form)

A relation schema R is in Boyce-Codd Normal


Form (BCNF) if whenever an FD X -> A holds
in R, then X is a superkey of R
 Each normal form is strictly stronger than the previous one
– Every 2NF relation is in 1NF
– Every 3NF relation is in 2NF
– Every BCNF relation is in 3NF
 There exist relations that are in 3NF but not in BCNF
 The goal is to have each relation in BCNF (or 3NF)

Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 10-63


Copyright © 2004 Ramez Elmasri and Shamkant Navathe
Boyce-Codd normal form

Note: The above figure is now called Figure 10.12 in Edition 4


Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 10-64
Copyright © 2004 Ramez Elmasri and Shamkant Navathe
A relation TEACH that is in 3NF but not in BCNF

Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 10-65


Copyright © 2004 Ramez Elmasri and Shamkant Navathe
Achieving the BCNF by Decomposition (1)

 Two FDs exist in the relation TEACH:


fd1: { student, course} -> instructor
fd2: instructor -> course
 {student, course} is a candidate key for this relation and that
the dependencies shown follow the pattern in Figure 10.12
(b). So this relation is in 3NF but not in BCNF
 A relation NOT in BCNF should be decomposed so as to
meet this property, while possibly forgoing the preservation
of all functional dependencies in the decomposed relations.

Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 10-66


Copyright © 2004 Ramez Elmasri and Shamkant Navathe
Achieving the BCNF by Decomposition (2)
 Three possible decompositions for relation TEACH
1. {student, instructor} and {student, course}
2. {course, instructor } and {course, student}
3. {instructor, course } and {instructor, student}
 All three decompositions will lose fd1. We have to settle for sacrificing
the functional dependency preservation. But we cannot sacrifice the
non-additivity property after decomposition.
 Out of the above three, only the 3 rd decomposition will not generate
spurious tuples after join.(and hence has the non-additivity property).
 A test to determine whether a binary decomposition (decomposition into
two relations) is nonadditive (lossless) is discussed in section 11.1.4
under Property LJ1. Verify that the third decomposition above meets the
property.

Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 10-67


Copyright © 2004 Ramez Elmasri and Shamkant Navathe

You might also like