CSE 324: Database Systems
Lecture 12: Normalization
11/12/22 DB:Normalization 1
Objectives
• Introduction +
• Objectives of Normalization +
• Normal Forms +
• The Process of Normalization +
• Un Normalized Form (UNF) +
• Converting UNF to Normalized Form +
• Summary +
11/12/22 DB:Normalization 2
- Introduction
• A process to validate and improve logical design so that it satisfies
certain constraints that avoid unnecessary duplication of the data.
• Normalization is a formal technique for analyzing a relation based
on its primary key and the Functional Dependencies FDs between
the attributes of that relation.
• The normalization process was first proposed by Codd 1972.
• The normalization process takes a relation schema through a series
of tests to certify whether it satisfies a certain normal form.
11/12/22 DB:Normalization 3
- Objectives of Normalization
• Solve problems associated with redundant data.
• Identify various types of update anomalies; including insertion,
deletion, and modification anomalies.
• Recognize the appropriateness or quality of the design of relations.
• Use FDs to group attributes into relations that are in a known
normal form.
11/12/22 DB:Normalization 4
- Objectives of Normalization
Normalization decomposes unsatisfactory "bad" relations
by breaking up their attributes into smaller relations
11/12/22 DB:Normalization 5
- Normal Forms
• The four most commonly used normal forms are:
• First Normal Form (1NF)
• Second Normal Form (2NF)
• Third Normal Form (3NF)
• Boyce-Codd Normal Form (BCNF)
• To convert un-normalized table to a normalized one, you first convert
it to 1NF, then to 2NF, then to 3NF and then to BCNF:
UNF 1NF 2NF 3NF BCNF
11/12/22 DB:Normalization 6
-- Relationship Between Normal
Forms
11/12/22 DB:Normalization 7
- Unnormalized Form (UNF)
• A table would be in unnormalized form if it contains:
• multi-valued attributes; or
• repeating (nested) groups
• To create an unnormalized table: Transform the data from the
information source (e.g. form , invoice) into table format with columns
and rows.
11/12/22 DB:Normalization 9
Figure 14.9
(a) A relation schema that is not in 1NF.
(b) Sample state of relation DEPARTMENT.
11/12/22 DB:Normalization 10
Figure 14.10
(a) Schema of the EMP_PROJ relation with a nested attributes PROJS.
(b) Sample extension of the EMP_PROJ relation showing nested attributes within each tuple.
11/12/22 DB:Normalization 11
- Converting From UNF to
Normalized Form
• 1NF +
• UNF to 1NF +
• Example: UNF to 1NF +
• 2NF +
• 1NF to 2NF +
• Example: 1NF to 2NF +
• 3NF +
• 2NF to 3NF +
• Example: 2NF to 3NF +
• BCNF +
• 3NF to BCNF +
• Example: 3NF to BCNF +
11/12/22 DB:Normalization 12
-- First Normal Form (1NF)
• A relation in which the intersection of each row and column contains one
and only one value.
• Disallows
• composite attributes
• multivalued attributes
• Repeating groups (nested relations); attributes whose values for an individual tuple are
non-atomic
11/12/22 DB:Normalization 13
-- UNF to 1NF
• Two Techniques:
• First:
• Expand the key so that there will be a separate tuple in the
original DEPARTMENT relation for each location of a
DEPARTMENT, as shown in Figure 14.9(c).
• In this case, the primary key becomes the combination {Dnumber,
Dlocation}.
• This solution has the disadvantage of introducing redundancy in the
relation and hence is rarely adopted.
• This redundancy will be removed in the higher normal forms.
11/12/22 DB:Normalization 14
-- UNF to 1NF
• Two Techniques:
• Second:
• Remove the attributes that violates 1NF and place it in a
separate relation along with the primary key as shown in
Figure 14. 10 (c)
• This option is generally considered the best, because it
does not suffer from redundancy and it is completely
general.
11/12/22 DB:Normalization 15
-- Example: UNF to 1NF
Figure 14.9
(a) A relation schema that is not in 1NF.
(b) Sample state of relation DEPARTMENT.
(c) 1NF version of the same relation with
redundancy (Note that the primary key is
the combination of Dnumber and
Dlocation)
11/12/22 DB:Normalization 16
-- Example: UNF to 1NF
Figure 14.10
(a) Schema of the EMP_PROJ relation with a
nested attributes PROJS.
(b) Sample extension of the EMP_PROJ
relation showing nested attributes within
each tuple.
(c) Decomposition of EMP_PROJ into relations
EMP_PROJ1 and EMP_PROJ2 by
propagating the primary key.
11/12/22 DB:Normalization 17
-- Second Normal Form (2NF)
• Based on the concept of full functional dependency.
• Full functional dependency indicates that if
• A and B are attributes of a relation,
• B is fully dependent on A if B is functionally
dependent on A but not on any proper subset of A.
• Example: In the last table {SSN, Pnumber} Hours is a full
dependency. (neither SSN Hours nor Pnumber hours hold.
• A relation that is in 1NF and every non-primary-key attribute is fully
functionally dependent on the primary key.
11/12/22 DB:Normalization 18
-- 1NF to 2NF
• If a primary key is a single attribute, then the relation is
already in 2NF.
• In the case of composite key, the following steps are applied to
convert to 2NF:
• Identify the primary key for the 1NF
• Identify the functional dependencies in the relation
• If partial dependency exists on the primary key, remove them by
placing them in a new relation along with a copy of their
determinant.
11/12/22 DB:Normalization 19
--- Example: 1NF to 2NF
FD1
1NF SSN Pnumber Hours Ename Pname Plocation
FD2
Get Rid of FD1
SSN Ename SSN Pnumber Hours Pname Plocation
FD2
Get Rid of FD2
2NF SSN Ename SSN Pnumber Hours Pnumber Pname Plocation
11/12/22 DB:Normalization 20
-- Third Normal Form (3NF)
• 3NF is based on the concept of transitive dependency.
• Transitive Dependency is a condition where
• A, B and C are attributes of a relation such that if A B
and B C,
• then C is transitively dependent on A through B.
(Provided that A is not functionally dependent on B or C).
11/12/22 DB:Normalization 21
-- 2NF to 3NF
• Identify the primary key in the 2NF relation.
• Identify functional dependencies in the relation.
• If transitive dependencies exist on the primary key,
remove them by placing them in a new relation
along with a copy of their determinant.
11/12/22 DB:Normalization 22
-- Example: 2NF to 3NF
EMP_DEPT Enam SSN Bdat Addres Dnumber Dnam DMGRSSN
e e s e
Transitive FD
Get Rid of Transitive FD
Ename SSN Bdate Address Dnumber Dnumber Dname DMGRSSN
EMPLOYEE DEPARTMENT
3NF
11/12/22 DB:Normalization 23
-- Boyce-Codd Normal Form
(BCNF)
• Based on functional dependencies that takes into account all candidate keys in a
relation.
• For a relation with simple candidate key(s) (not composite), 3NF and BCNF are
equivalent.
• A relation is in BCNF, if and only if every determinant is a candidate key. (remember
that a determinant is the Left-hand side of FD)
• AB C
• XY
• Violation of BCNF may occur in a relation that has
• a composite candidate key (one or more)
• a FD: X → A such that X not being a (candidate) key and A is part of a composite candidate key
11/12/22 DB:Normalization 24
-- 3NF to BCNF
• Identify all candidate keys in the relation.
• Identify all functional dependencies in the relation.
• If functional dependencies exists in the relation where
their determinants are not candidate keys for the
relation, remove the functional dependencies by
placing them in a new relation along with a copy of
their determinant.
11/12/22 DB:Normalization 25
-- Example: 3NF to BCNF
composite Key
A B C D E
BCNF Normalization
New relation due to D B The new primary key is AD
D B A C D E
11/12/22 DB:Normalization 26
-- Example: 3NF to BCNF
TEACH relation with the following dependencies:
• FD1: {Student, Course} → Instructor
• FD2: Instructor → Course (i.e., each instructor teaches one course only)
• {Student, Course} is a candidate key
Figure 14.14
A sample sate for the relation TEACH
that is in 3NF but not BCNF.
11/12/22 DB:Normalization 27
-- Example: 3NF to BCNF
TEACH will be decomposed into two relations:
R1 (Instructor, Course)
and
R2(Instructor, Student)
TEACH
Student Course Instructor
R1 R2
Instructor Course Student Instructor
11/12/22 DB:Normalization 28
- Summary
UNF
Remove repeating groups
1NF
Remove partial dependencies
2NF
Remove transitive dependencies
3NF
Remove remaining dependencies
BCNF
11/12/22 DB:Normalization 29
… - Summary
• Definition
• A functional dependency X A is called trivial if, A X.
• A relation is in 1NF if, and only if, every attribute is single-valued for each tuple.
• A relation is in 2NF if, and only if, it is in 1NF and all the non-key attributes are fully
functionally dependent on the key.
• A relation is in 3NF if, when ever a non-trivial functional dependency X A exists,
then either X is a superkey or A is a member of some candidate key.
• A relation is in BCNF if, whenever a non-trivial functional dependency X A
exists, then X is a superkey.
11/12/22 DB:Normalization 30