Database Normalization
R. Balcita
Introduction
Database Normalization is the process of structuring a relational database to reduce redundancy
and improve data integrity. It involves organizing data into tables and ensuring that relationships
between them are properly defined.
Normalization follows a set of rules known as normal forms (NF), which range from 1NF
(First Normal Form) to 5NF (Fifth Normal Form) and even 6NF in some cases.
Normalization Process with Examples
Unnormalized Table (UNF)
Before normalization, a table may have redundant and inconsistent data.
Example: Student Table (Unnormalized Form)
Student_ID Student_Name Course Instructor Instructor_Phone
101 Alice Math Prof. John 123-456-7890
101 Alice Science Prof. Mark 987-654-3210
102 Bob Math Prof. John 123-456-7890
� Issues:
Data redundancy (Instructor info repeated).
Difficult to update instructor details.
First Normal Form (1NF) – Remove Duplicates & Ensure Atomicity
Rules:
1. Each column must contain atomic (indivisible) values.
2. Each row should have a unique identifier (Primary Key).
3. No duplicate columns.
1NF Conversion
Student_ID Student_Name Course_ID Course Instructor_ID Instructor Instructor_Phone
101 Alice C01 Math I01 Prof. John 123-456-7890
101 Alice C02 Science I02 Prof. Mark 987-654-3210
102 Bob C01 Math I01 Prof. John 123-456-7890
� Improvements:
Each column contains atomic values.
No multiple values in a single cell.
A new Course_ID and Instructor_ID ensure better structuring.
Second Normal Form (2NF) – Remove Partial Dependency
Rules:
1. Meet 1NF requirements.
2. Remove partial dependencies (No attribute should depend on part of a composite
primary key).
Identify Partial Dependency
Course Name & Instructor info depend on Course_ID, not Student_ID.
Split into separate tables:
2NF Conversion
Student Table
Student_ID Student_Name
101 Alice
102 Bob
Course Enrollment Table
Student_ID Course_ID
101 C01
101 C02
102 C01
Course Table
Course_ID Course Instructor_ID
C01 Math I01
C02 Science I02
Instructor Table
Instructor_ID Instructor Instructor_Phone
I01 Prof. John 123-456-7890
I02 Prof. Mark 987-654-3210
� Improvements:
Removed partial dependency by splitting data into separate tables.
Instructor and Course details are now independent.
Third Normal Form (3NF) – Remove Transitive Dependency
Rules:
1. Meet 2NF requirements.
2. Remove transitive dependencies (Non-key columns should not depend on other non-key
columns).
Identify Transitive Dependency
Instructor_Phone depends on Instructor, not directly on Course.
3NF Conversion (Already Achieved in 2NF!)
Since the Instructor Table exists separately, there are no transitive dependencies.
Higher Normal Forms (4NF & 5NF – Advanced Cases)
4NF: Removes multi-valued dependencies.
5NF: Ensures no join dependencies exist.
� Most real-world databases stop at 3NF or BCNF (Boyce-Codd Normal Form).
Quiz: Test Your Knowledge!
1. Which normal form eliminates repeating groups in a table?
a) 1NF
b) 2NF
c) 3NF
d) 4NF
2. A table in 2NF must first be in which normal form?
a) 3NF
b) 1NF
c) 4NF
d) BCNF
3. Which issue does 3NF mainly address?
a) Partial dependency
b) Transitive dependency
c) Data duplication
d) Multi-valued dependency
4. In which normal form is every determinant a candidate key?
a) 1NF
b) 2NF
c) BCNF
d) 4NF
5. Which of the following is a transitive dependency?
a) Student_ID → Student_Name
b) Course_ID → Instructor → Instructor_Phone
c) Student_ID → Course_ID
d) Course_ID → Course_Name
Conclusion
Normalization helps maintain data integrity, reduce redundancy, and improve efficiency in a
database. Most databases aim for 3NF or BCNF for a good balance between efficiency and
complexity.