NORMALIZATION
Definitions
• A process of analyzing a relation to ensure that it is well formed
• Normalization involves decomposing relations to produce smaller,
well-structured relations
• A formal process for deciding which attributes should be grouped
together in a relation so that all anomalies are removed
• More specifically, if a relation is normalized (well-formed), rows can
be inserted, deleted, or modified without creating anomalies
Goals
• Minimize data redundancy thereby conserving space and
avoiding anomalies
• Make it easier to maintain data
• Provide a better design that is an improved representation
of the real world
Normalization is a logical data-
modelling technique used to ensure
that data are well structured from an
organization-wide view
Modification Anomalies
Tables that are not normalized are susceptible to experiencing
modification anomalies
◦ Insertion Anomaly - occurs when certain attributes cannot be inserted into
the database without the presence of other attributes
◦ Update Anomaly - exists when one or more instances of duplicated data is updated,
but not all
◦ Deletion Anomaly - exists when certain attributes are lost because of the deletion of
other attributes
Insertion Anomaly
customerId customerName carId year make model
1 Mischka Sophia 102 2003 Volkswagen Golf
2 Kenra Elienette 101 2014 BMW 550i
3 Thea Pauline 103 2010 Nissan Sentra
Most modification problems are solved by breaking an existing
table into two or more tables through a process known as
normalization
Customer Car
customerId customerName carId carId year make model
1 Mischka Sophia 102 102 2003 Volkswagen Golf
2 Kenra Elienette 101 101 2014 BMW 550i
3 Thea Pauline 103 103 2010 Nissan Sentra
104 2020 Honda Civic
Update Anomaly
exists when one or more instances of duplicated data is updated, but not all
Customer
customerId carId year make model price
Name
Mischka
1 102 2014 Mitsubishi Mirage 900K
Sophia
Kenra
2 101 2014 Toyota Prius 2M
Elienette
Thea
3 103 2010 Nissan Sentra 900K
Pauline
4 Ruel Paolo 102 2014 Mitsubishi Mirage 900K
James
5 104 2016 Honda Civic 1.2M
Aaron
Eunice
6 102 2014 Mitsubishi Mirage 900K
Czarina
Customer
customerId carId carId year make model price
Name
Mischka 102 2014 Mitsubishi Mirage 900K
1 102
Sophia
Kenra 101 2014 Toyota Prius 2M
2 101
Elienette
Thea 103 2010 Nissan Sentra 900K
3 103
Pauline
4 Ruel Paolo 102 102 2014 Mitsubishi Mirage 900K
James 104 2016 Honda Civic 1.2M
5 104
Aaron
Eunice 102 2014 Mitsubishi Mirage 900K
6 102
Czarina
Deletion Anomaly
exists when certain attributes are lost because of the deletion of
other attributes
EmpID EmpName Address DeptID DeptName DeptMgr
1 Hanae Takada Osaka D01 Accounting Valdez
2 Atsuko Ono Tokyo D02 Operations Deverala
3 Chigusa Mami Nagoya D03 Services Talavera
4 Akiko Mizuno Kyoto D04 Training Antonio
5 Chelsea Dy Chennai D02 Operations Deverala
6 Donald Cruz Mumbai D01 Accounting Valdez
EmpID EmpName Address DeptID DeptID DeptName DeptMgr
Hanae
1 Osaka D01 D01 Accounting Valdez
Yoshimori
2 Atsuko Imai Tokyo D02 D02 Operations Deverala
3 Chigusa Sazaki Nagoya D03 D03 Services Talavera
4 Reiko Kawai Kyoto D04 D04 Training Antonio
5 Kalpint Patel Chennai D02 D02 Operations Deverala
Donald
6 Mumbai D01 D01 Accounting Valdez
McElhenny
Definitions
Functional dependency
You can also describe this as a relationship where knowing the
value of one attribute (or a set of attributes) is enough to tell you
the value of another attribute (or set of attributes) in the same
table.
Examples:
SSS No. -> Employee’s Name, Employee’s contributions …
Bank Account No. -> Customer Name, Amount of Deposit…
Partial functional dependency – when a non-key attribute is
functionally dependent on part (but not all) of the primary key
Flight NoOf
FlightDate From To
Number Passengers
1750 Kuala
UA 36 3-Sep-16 1410 Manila 200
Lumpur
0455 Kuala
AA 704 31-Oct-16 0120 Manila 150
Lumpur
1750 Kuala
UA 36 4-Sep-16 1410 Manila 190
Lumpur
0435 Kuala
BA 9 31-Oct-16 0100 Manila 100
Lumpur
Transitive dependency – a functional dependency between the
primary key and one or more nonkey attributes that are
dependent on the primary key via another nonkey attribute
This means if we have a primary key A and a non-key domain B
and C where C is more dependent on B than A and B is directly
dependent on A, then C can be considered transitively dependent
on A.
A B C
Transitive dependency – a functional dependency between the
primary key and one or more nonkey attributes that are
dependent on the primary key via another nonkey attribute
This means if we have a primary key DeptID and a non-key
domain MgrID and MgrName where MgrName is more
dependent on MgrID than DeptID and MgrID is directly
dependent on DeptID, then MgrName can be considered
transitively dependent on DeptID.
DeptID MgrID MgrName
Steps in Normalization
Normalization can be accomplished in stages, each of which
corresponds to a normal form
1. First Normal Form - Any multivalued attribute (also called
repeating groups) have been removed
2. Second Normal Form – Any partial functional dependencies have
been removed
3. Third Normal Form – Any transitive dependencies have been
removed
Order ID : 1006 Order Date : Oct 24, 2019
Customer ID : 22
Customer Name : Dan’s Furniture
Customer Address : Fullerton, California
Product ID Product Description Product Finish Unit Price Ordered Quantity
7 Tea Table Walnut $450 2
5 TV Stand Oak $300 6
4 Porch Swing Pine $800 5
Order ID : 1007 Order Date : Oct 31, 2019
Customer ID : 65
Customer Name : Furniture Barn
Customer Address : Fort Collins, Colorado
Product ID Product Description Product Finish Unit Price Ordered Quantity
11 Table Cherry $1150 3
4 Porch Swing Pine $800 7
Table with multivalued Attributes:
Not in 1st Normal Form
Order Order Cust ID Cust Cust Prod Product Prod Unit Ordered
ID Date Name Address ID Descriptio Finish Price Qty
n
1006 24 Oct 22 Dan’s Fullerton, 7 Tea Table Walnu $450 2
2019 Furniture CA 5 TV Stand t $300 6
4 Porch Oak $800 5
Swing Pine
1007 31 Oct 65 Furniture Ft. 11 Table Cherry $1150 3
2019 Barn Collins,Co 4 Porch Pine $800 7
Swing
Table with no multivalued Attributes:
In 1st Normal Form
Order Order Cust Customer Customer Prod Product Prod Unit Ordere
ID Date ID Name Address ID Description Finish Price d Qty
1006 24 Oct 22 Dan’s Fullerton, 7 Tea Table Walnut $450 2
2019 Furniture CA
1006 24 Oct 22 Dan’s Fullerton, 5 TV Stand Oak $300 6
2019 Furniture CA
1006 24 Oct 22 Dan’s Fullerton, 4 Porch Swing Pine $800 5
2019 Furniture CA
1007 31 Oct 65 Furniture Ft. 11 Table Cherry $1150 3
2019 Barn Collin,Co
1007 31 Oct 65 Furniture Ft. 4 Porch Swing Pine $800 7
2019 Barn Collin,Co
Going to the 2nd Normal Form Full Dependency
Order Order Cust Customer Customer Prod Product Product Unit Ordered
ID Date ID Name Address ID Description Finish Price Qty
Partial Dependencies Partial Dependencies
Order_ID -> Order_Date, Customer_ID, Customer_Name,
Customer_Address
Product_ID -> Product_Description, Product_Finishm Unit_Price
Order_ID, Product_ID -> Order_Quantity
Second Normal Form
1NF PLUS No partial dependencies
Order Customer Customer Customer
Order ID
Date ID Name Address
Product Product
Product ID Unit Price
Description Finish
Ordered
Order ID Product ID
Quantity
Going to the 3rd Normal Form
Transitive Dependencies
Order Customer Customer Customer
Order ID
Date ID Name Address
Product Product
Product ID Unit Price
Description Finish
Ordered
Order ID Product ID
Quantity
Third Normal Form
2NF PLUS No transitive dependencies
OrderT
Product Product
Product ID Unit Price Product
Description Finish
Ordered
Order ID Product ID Order_Detail
Quantity
Customer
Third Normal Form
2NF PLUS No transitive dependencies
OrderT
Product Product
Product ID Unit Price Product
Description Finish
Ordered
Order ID Product ID Order_Detail
Quantity
Customer
Dependency Diagram Full Dependency
Transitive Dependencies
Order Order Cust Customer Customer Prod Product Product Unit Ordered
ID Date ID Name Address ID Description Finish Price Qty
Partial Dependencies Partial Dependencies
Order_ID, Product_ID -> Order_Quantity
Order_ID -> Order_Date, Customer_ID
Product_ID -> Product_Description, Product_Finishm Unit_Price
Customer_ID -> Customer_Name, Customer_Address
UNF
ORDERPRODUCT(orderid, orderdate,custid,custname, custadrs,{prodid, proddesc,prodfin,uprice,qty})
Order Order Cust ID Cust Cust Prod Product Prod Unit Ordered
ID Date Name Address ID Descriptio Finish Price Qty
n
1006 24 Oct 22 Dan’s Fullerton, 7 Tea Table Walnu $450 2
2019 Furniture CA 5 TV Stand t $300 6
4 Porch Oak $800 5
Swing Pine
1007 31 Oct 65 Furniture Ft. 11 Table Cherry $1150 3
2019 Barn Collins,Co 4 Porch Pine $800 7
Swing
1NF
ORDERPRODUCT(orderid, orderdate,custid,custname, custadrs,prodid, proddesc,prodfin,uprice,qty)
Order Order Cust Customer Customer Prod Product Prod Unit Ordere
ID Date ID Name Address ID Description Finish Price d Qty
1006 24 Oct 22 Dan’s Fullerton, 7 Tea Table Walnut $450 2
2019 Furniture CA
1006 24 Oct 22 Dan’s Fullerton, 5 TV Stand Oak $300 6
2019 Furniture CA
1006 24 Oct 22 Dan’s Fullerton, 4 Porch Swing Pine $800 5
2019 Furniture CA
1007 31 Oct 65 Furniture Ft. 11 Table Cherry $1150 3
2019 Barn Collin,Co
1007 31 Oct 65 Furniture Ft. 4 Porch Swing Pine $800 7
2019 Barn Collin,Co
3NF
OrderT(orderID, orderdate,custid)
Product(prodid, proddesc,finish,
uprice)
OrderT
Order_detail(orderID, prodid, qty)
Customer(custid,custname,custadrs) Product Product
Product ID Unit Price
Description Finish
Ordered
Order ID Product ID Order_Detail
Quantity
Customer
Source
Modern Database Management, 11th Ed
Hoffer , Prescott, Topi
Some definitions (anomalies), examples (functional dependencies) taken from the net