Database Normalization
Md. Fazlul Karim Patwary
IIT
Database Normalization
• Database normalization is the process of removing
redundant data from tables to improve storage efficiency,
data integrity, and scalability.
• Classification of normalization are called normal forms (or
NF), and there are algorithms for converting a given
database between them.
• Normalization generally involves splitting existing tables
into multiple ones, which must be re-joined or linked each
time a query is issued.
Database Normalization
Edgar F. Codd originally established three normal forms:
1NF, 2NF and 3NF.
• There are now others that are generally accepted, but 3NF
is widely considered to be sufficient for most applications.
• Most tables when reaching 3NF are also in BCNF (Boyce-
Codd Normal Form).
Dependencies: Some Definitions
Multivalued Attributes or repeating groups: non-key
attributes or groups of non-key attributes the values of
which are not uniquely identified by the value of the
Primary Key. In other words, their values are not
functionally dependent on primary key.
STUDENT
Stud_ID Name Course_ID Units
101 Lennon MSI 250 3.00
101 Lennon MSI 415 3.00
125 Johnson MSI 331 3.00
Dependencies: Some Definitions
Partial Dependency – when an non-key attribute is
determined by a part, but not the whole, of a
COMPOSITE primary key.
Partial
Dependency
CUSTOMER
Cust_ID Name Order_ID
101 AT&T 1234
101 AT&T 156
125 Cisco 1250
Dependencies: Some Definitions
• Transitive Dependency – when a non-key
attribute determines another non-key
attribute.
Transitive
Dependency
EMPLOYEE
Emp_ID F_Name L_Name Dept_ID Dept_Name
111 Mary Jones 1 Acct
122 Sarah Smith 2 Mktg
Normal Forms: Review
• Unnormalized – There are multivalued
attributes or repeating groups
• 1 NF – No multivalued attributes or
repeating groups.
• 2 NF – 1 NF plus no partial dependencies
• 3 NF – 2 NF plus no transitive dependencies
Normal Forms: Definition
First Normal Form: If all attributes are directly or indirectly
determined by the primary key then the table is in First
Normal Form (1NF).
Second Normal Form: If a table is in first normal form and
there are no partial dependencies then the table is called
is in second normal form.
Third Normal Form: If a table is in second normal form and
there are no partial dependencies then the table is called
is in third normal form.
Table: 1
Title Author1 Author2 ISBN Subject Pages Publisher
Database Abraham Henry F. 0072958863 MySQL, 1168 McGraw-Hill
System Silberschatz Korth Computers
Concepts
Operating Abraham Henry F. 0471694665 Computers 944 McGraw-Hill
System Silberschatz Korth
Concepts
First Normal Form
In Table 1, we have two violations of First Normal Form:
• There are more than one author field,
• Subject field contains more than one piece of
information. With more than one value in a single field, it
would be very difficult to search for all books on a given
subject.
First Normal Table: Table 2
Title Author ISBN Subject Pages Publisher
Database System Abraham 0072958863 MySQL 1168 McGraw-Hill
Concepts Silberschatz
Database System Henry F. Korth 0072958863 Computers 1168 McGraw-Hill
Concepts
Operating System Henry F. Korth 0471694665 Computers 944 McGraw-Hill
Concepts
Operating System Abraham 0471694665 Computers 944 McGraw-Hill
Concepts Silberschatz
Decomposition
Subject Table
Subject_ID Subject
Author Table
1 MySQL
Author_ID Last Name First Name
2 Computers
1 Silberschatz Abraham
2 Korth Henry
Book Table
ISBN Title Pages Publisher
0072958863 Database System 1168 McGraw-Hill
Concepts
0471694665 Operating System 944 McGraw-Hill
Concepts
Decomposition
Publisher Table
Publisher_ID Publisher Name
1 McGraw-Hill
Book Table
ISBN Title Pages Publisher_ID
0072958863 Database System 1168 1
Concepts
0471694665 Operating System 944 1
Concepts
Example 1: Determine NF
• ISBN Title
• ISBN Publisher
• Publisher Address All attributes are directly or
indirectly determined by the
primary key; therefore, the
relation is at least in 1 NF
BOOK
ISBN Title Publisher Address
Example 1: Determine NF
• ISBN Title The relation is at least in 1NF.
• ISBN Publisher There is no COMPOSITE primary
key, therefore there can’t be
• Publisher Address partial dependencies. Therefore,
the relation is at least in 2NF
BOOK
ISBN Title Publisher Address
Example 1: Determine NF
• ISBN Title Publisher is a non-key attribute, and it
• ISBN Publisher determines Address, another non-key
attribute. Therefore, there is a
• Publisher Address transitive dependency, which means
that the relation is NOT in 3 NF.
BOOK
ISBN Title Publisher Address
Example 1: Determine NF
We know that the relation is at
• ISBN Title least in 2NF, and it is not in 3 NF.
• ISBN Publisher Therefore, we conclude that the
relation is in 2NF.
• Publisher Address
BOOK
ISBN Title Publisher Address
Example 2: Determine NF
• Product_ID Description
All attributes are directly or indirectly
determined by the primary key;
therefore, the relation is at least in 1 NF
ORDER
Order_No Product_ID Description
Example 2: Determine NF
• Product_ID Description
The relation is at least in 1NF.
There is a COMPOSITE Primary Key (PK) (Order_No,
Product_ID), therefore there can be partial dependencies.
Product_ID, which is a part of PK, determines Description;
hence, there is a partial dependency. Therefore, the
relation is not 2NF. No sense to check for transitive
dependencies!
ORDER
Order_No Product_ID Description
Example 2: Determine NF
• Product_ID Description
We know that the relation is at least in
1NF, and it is not in 2 NF. Therefore, we
conclude that the relation is in 1 NF.
ORDER
Order_No Product_ID Description
Example 2: Determine NF
• Product_ID Description
In your solution you will write the following
justification:
1) No M/V attributes, therefore at least 1NF
2) There is a partial dependency (Product_ID
Description), therefore not in 2NF
Conclusion: The relation is in 1NF
ORDER
Order_No Product_ID Description
Example 3: Determine NF
• Part_ID Description Comp_ID and No are not
• Part_ID Price determined by the primary
key; therefore, the relation is
• Part_ID, Comp_ID No NOT in 1 NF. No sense in
looking at partial or transitive
dependencies.
PART
Part_ID Descr Price Comp_ID No
Example 3: Determine NF
In your solution you will write the
• Part_ID Description following justification:
• Part_ID Price 1) There are M/V attributes;
therefore, not 1NF
• Part_ID, Comp_ID No Conclusion: The relation is not
normalized.
PART
Part_ID Descr Price Comp_ID No
Bringing a Relation to 1NF
STUDENT
Stud_ID Name Course_ID Units
101 Lennon MSI 250 3.00
101 Lennon MSI 415 3.00
125 Johnson MSI 331 3.00
Bringing a Relation to 1NF
• Option 1: Make a determinant of the
repeating group (or the multivalued
attribute) a part of the primary key.
Composite
Primary Key
STUDENT
Stud_ID Name Course_ID Units
101 Lennon MSI 250 3.00
101 Lennon MSI 415 3.00
125 Johnson MSI 331 3.00
Bringing a Relation to 1NF
• Option 2: Remove the entire repeating group from the
relation. Create another relation which would contain all
the attributes of the repeating group, plus the primary
key from the first relation. In this new relation, the
primary key from the original relation and the
determinant of the repeating group will comprise a
primary key.
STUDENT
Stud_ID Name Course_ID Units
101 Lennon MSI 250 3.00
101 Lennon MSI 415 3.00
125 Johnson MSI 331 3.00
Bringing a Relation to 1NF
STUDENT
Stud_ID Name
101 Lennon
125 Jonson
STUDENT_COURSE
Stud_ID Course Units
101 MSI 250 3
101 MSI 415 3
125 MSI 331 3
Bringing a Relation to 2NF
Composite
Primary Key
STUDENT
Stud_ID Name Course_ID Units
101 Lennon MSI 250 3.00
101 Lennon MSI 415 3.00
125 Johnson MSI 331 3.00
Bringing a Relation to 2NF
• Goal: Remove Partial Dependencies
Partial
Composite Dependencies
Primary Key
STUDENT
Stud_ID Name Course_ID Units
101 Lennon MSI 250 3.00
101 Lennon MSI 415 3.00
125 Johnson MSI 331 3.00
Bringing a Relation to 2NF
• Remove attributes that are dependent from the part but
not the whole of the primary key from the original
relation. For each partial dependency, create a new
relation, with the corresponding part of the primary key
from the original as the primary key.
STUDENT
Stud_ID Name Course_ID Units
101 Lennon MSI 250 3.00
101 Lennon MSI 415 3.00
125 Johnson MSI 331 3.00
Bringing a Relation to 2NF
CUSTOMER
STUDENT_COURSE
Stud_ID Name Course_ID Units
101 Lennon MSI 250 3.00
101 Lennon MSI 415 3.00 Stud_ID Course_ID
125 Johnson MSI 331 3.00
101 MSI 250
101 MSI 415
125 MSI 331
STUDENT COURSE
Stud_ID Name Course_ID Units
101 Lennon MSI 250 3.00
101 Lennon MSI 415 3.00
125 Johnson MSI 331 3.00
Bringing a Relation to 3NF
• Goal: Get rid of transitive dependencies.
Transitive
Dependency
EMPLOYEE
Emp_ID F_Name L_Name Dept_ID Dept_Name
111 Mary Jones 1 Acct
122 Sarah Smith 2 Mktg
Bringing a Relation to 3NF
• Remove the attributes, which are dependent on a non-
key attribute, from the original relation. For each
transitive dependency, create a new relation with the
non-key attribute which is a determinant in the transitive
dependency as a primary key, and the dependent non-key
attribute as a dependent.
EMPLOYEE
Emp_ID F_Name L_Name Dept_ID Dept_Name
111 Mary Jones 1 Acct
122 Sarah Smith 2 Mktg
Bringing a Relation to 3NF
EMPLOYEE
Emp_ID F_Name L_Name Dept_ID Dept_Name
111 Mary Jones 1 Acct
122 Sarah Smith 2 Mktg
EMPLOYEE
Emp_ID F_Name L_Name Dept_ID
111 Mary Jones 1
122 Sarah Smith 2
DEPARTMENT
Dept_ID Dept_Name
1 Acct
2 Mktg
Boyce-Codd Normal Form
A relation schema R is in BCNF with respect to a set F of functional
dependencies if for all functional dependencies in F+ of the form
where R and R, at least one of the following holds:
• is trivial (i.e., )
• is a superkey for R
Example schema not in BCNF:
bor_loan = ( customer_id, loan_number, amount )
because loan_number amount holds on bor_loan but loan_number
is not a superkey
Third Normal Form
A relation schema R is in third normal form (3NF) if
• for all functional dependencies in F+ of the form ,
where R and R, at least one of the following
holds:
is a trivial functional dependency ( )
contains a key for R
every B is part of some candidate key of R
• BCNF and 3NF
– A BCNF relation is in 3NF
– A 3NF relation is not necessary in BCNF
Third Normal Form
• Example
– Consider the two relational schemas
• R1 = (cust-num, name, house-num, street, city, state)
cust-num name, house-num, street, city, state
• R2 = (house-num, street, city, state, zip)
house-num, street, city, state zip
zip state
– Are these relations in 3NF?
– For R1
• The only nontrivial functional dependencies in F+ are those with cust-num
as a member of the left-side of the FD
• As cust-num is a superkey of R1, these functional dependencies satisfy the
second condition for 3NF. i.e satisfy BCNF conditions.
Third Normal Form
• R1 = (cust-num, name, house-num, street, city, state)
cust-num name, house-num, street, city, state
• R2 = (house-num, street, city, state, zip)
house-num, street, city, state zip
zip state
– For R2
There are two kinds of nontrivial functional dependencies in F+:
– Those with (house-num, street, city, state) as a subset of the left
hand side of the FD: As (house-num, street, city, state) is a
superkey for R2, these functional dependencies satisfy the
second condition for 3NF
– Those of the form {zip} {state} where
For any such functional dependency:
( {state}) – ( {zip}) = {state} (or = )
Because state is part of a candidate key of R2, such functional
dependencies satisfy the third condition for 3NF
Create Tables
Assume the supply department in a company is in charge of
bringing parts from different manufacturers. A part is uniquely
identified by its name and manufacturer; for convenience, a part
is also given an id.
A separate delivery is necessary for each type of part, from
each manufacturer. At most one delivery is made in one day for
one type of part from one manufacturer. A “transport” (e.g.
van23) is associated with each delivery. Each transport has a
unique driver. A driver can drive more than one “transports”.
First create non-normal tables and then progressively
create normal forms.