What is Data
Normalization?
Normalization is the process
to eliminate data
redundancy and enhance
data integrity in the table.
Read
more
Created by
Pranjal Meshram
1.
Normalization is the process of organizing the data
in the database.
2.
It divides larger table into smaller and links them
using relationships. Hence removing data
modification anomalies like Insertion, Updation
and Deletion
3.
Normalization works through a series of stages
called Normal forms. The normal forms apply to
individual relations. The relation is said to be in
particular normal form if it satisfies constraints.
Created by Slide
Pranjal Meshram
Why do we need
Normalization?
A large database defined as a single table can lead
to several redundancies in data:
Data Duplication
Inefficient Disk Space Utilization
Updating and Maintenance Challenges:
as it requires searching through numerous
records within the table.
Increased Risk of Errors and
Inconsistencies: as multiple instances of the
same data must be updated consistently.
Created by Slide
Pranjal Meshram
Types of Normal
Form
1. First Normal Form (1NF)
2. Second Normal Form (2NF)
3. Third Normal Form (3NF)
4. Boyce-Codd Normal Form
(BCNF)
5. Fourth Normal Form (4NF)
6. Fifth Normal Form (5NF)
Created by
Pranjal Meshram
First Normal
Form (1NF)
A table is in 1NF if:
All columns contain only atomic (indivisible) values.
Each column contains values of a single type.
Example:
CustomerID CustomerName Orders
1 Alice Order1, Order2
2 Bob Order3
1NF Conversion
CustomerID CustomerName Order
1 Alice Order1
1 Alice Order2
Created by 2 Bob Order3
Pranjal Meshram
Second Normal
Form (2NF)
A table is in 2NF if:
It is in 1NF.
All non-key attributes are fully functionally dependent
on the entire primary key.
The table should not possess partial dependency.
Example: 2NF Conversion
OrderID ProductID OrderDate ProductName OrderID ProductID OrderDate
1 101 2023-06-01 Widget A 1 101 2023-06-01
1 102 2023-06-01 Widget B 1 102 2023-06-01
2 101 2023-06-02 Widget A 2 101 2023-06-02
ProductName is dependent only on ProductID ProductName
ProductID, not the entire primary
key. Split the table: 101 Widget A
Created by
102 Widget B
Pranjal Meshram
Third Normal
Form (3NF)
A table is in 3NF if:
It is in 2NF.
There are no transitive dependencies, meaning non-
key attributes are not dependent on other non-key
attributes.
Example: 3NF Conversion
OrderID CustomerID OrderDate CustomerAddress OrderID CustomerID OrderDate
1 1 2023-06-01 Address1 1 1 2023-06-01
2 2 2023-06-02 Address2 2 2 2023-06-02
OrderID determines CustomerID and CustomerID
determines CustomerAddress. But the primary key CustomerID CustomerAddress
here is OrderID only. Therefore, OrderID
determines CustomerAddress via CustomerID. 1 Address1
This implies that the table possesses a transitive
functional dependency. Split the table: 2 Address2
Created by
Pranjal Meshram
Boyce-Codd Normal
Form (BCNF)
A table is in BCNF if:
It is in 3NF.
Every determinant is a candidate key. (A determinant
is an attribute that determines other attributes).
Example: BCNF Conversion
ProfessorID CourseID ProfessorName ProfessorID ProfessorName
1 101 Prof. Smith 1 Prof. Smith
2 102 Prof. Johnson 2 Prof. Johnson
Here a professor can teach multiple courses.
But if a course can only be taught by one CourseID ProfessorID
professor, then CourseID should determine
ProfessorID. Split the table: 101 1
Created by 102 2
Pranjal Meshram
Fourth Normal
Form (4NF)
A table is in 4NF if:
It is in BCNF.
It has no multi-valued dependencies (a scenario where
one attribute determines a set of values).
Example: 4NF Conversion
EmployeeID Skill Project EmployeeID Skill EmployeeID Project
1 Java A 1 Java 1 A
1 SQL A 1 SQL 1 B
1 Java B 2 Python
2 C
1 SQL B 2 Java
2 Python C EmployeeID determines Skill. and EmployeeID
also determines Project.
2 Java C
These dependencies mean that each
employee can have multiple skills and work on
Created by multiple projects, leading to data redundancy
Pranjal Meshram and anomalies.
Fifth Normal
Form (5NF)
A table is in 5NF if:
It is in 4NF.
It is decomposed into smaller tables to eliminate
redundancy while preserving data integrity and
ensuring that all join dependencies are lossless.
Example: 5NF Conversion
Employee Departme ProjectID EmployeeID ProjectID DepartmentID
ProjectID
ID ntID
1 1 1 10
1 1 10
1 2 1 20
1 2 20
2 1 2 10
2 1 10
EmployeeID DepartmentID
Decompose into separate tables
1 10
Created by 2 20
Pranjal Meshram
Candidate Key
A candidate key is a set of one or more columns that can
identify a record uniquely in a table. It is a super key with no
unnecessary attributes. There can be multiple candidate keys in
a table. One of the candidate keys can be chosen as the primary
key.
Super Key
A super key can consist of one or more columns whose
combined values are unique across all rows in the table. It may
contain additional attributes that are not necessary for unique
identification. There can be many super keys in a table. It
includes candidate keys and the primary key as special cases.
Created by
Pranjal Meshram