Normalization
COURSE 4: Databases
Normalization
when and why
Databases C5: Normal Forms
Normalization
• Informal:
• Organize data in a relational database in order to avoid redundancy and data
manipulation anomalies.
• Decompose a relation (table) without loosing information.
Databases C5: Normal Forms
Incorrect
Incorrect ER
table structure
INSERT/UPDATE/DELETE
REDUNDANCY
ANOMALY
Databases C5: Normal Forms
Incorrect
Incorrect ER
table structure
decomposition /
synthesis
----
normalization
INSERT/UPDATE/DELETE
REDUNDANCY
ANOMALY
Databases C5: Normal Forms
Normalization
• Avoid redundancy
Databases C5: Normal Forms
Normalization CUSTOMER
CUSTOMER_ID LAST_NAME … ….
1 Smith ... ….
• Avoid redundancy
2 Green … ….
3 Avery ... ….
LOAN
LOAN_ID CUSTOMER_ID AMOUNT DATE
101 1 125000 18/04/21
102 1 25000 14/04/22
103 2 12500 03/05/21
127 2 20000 …
389 3 75000 …
Databases C5: Normal Forms
Normalization
• Avoid redundancy
Databases C5: Normal Forms
Normalization
• Avoid redundancy
CUSTOMER_ID NAME LOAN_ID TYPE AMOUNT
1 Smith 101 mortgage 125000
1 Smith 102 credit card 25000
2 Green 103 credit card 12500
2 Green 127 mortgage 20000
3 Avery 389 mortgage 75000
3 Avery 486 credit card 5000
3 Avery 769 mortgage 45000
Databases C5: Normal Forms
Normalization
• INSERT anomaly
CUSTOMER_ID NAME LOAN_ID TYPE AMOUNT
1 Smith 101 mortgage 125000
1 Smith 102 credit card 25000
2 Green 103 credit card 12500
2 Green 127 mortgage 20000
3 Avery 389 mortgage 75000
3 Avery 486 credit card 5000
4 Stark ??? null null
Databases C5: Normal Forms
Normalization
• UPDATE anomaly
CUSTOMER_ID NAME LOAN_ID TYPE AMOUNT
1 Smith 101 mortgage 125000
1 Smith 102 credit card 25000
2 Green 103 credit card 12500
2 Green 127 mortgage 20000
3 Avery 389 mortgage 75000
3 Avery 486 credit card 5000
3 Avery 769 mortgage 45000
Databases C5: Normal Forms
Normalization
• DELETE anomaly
CUSTOMER_ID NAME LOAN_ID TYPE AMOUNT
1 Smith 101 mortgage 125000
1 Smith 102 credit card 25000
2 Green 103 credit card 12500
2 Green 127 mortgage 20000
3 Avery 389 mortgage 75000
3 Avery 486 credit card 5000
4 Stark 700 mortgage 45000
Databases C5: Normal Forms
Table with
redundancy/not in a
normal form
Analyze
dependencies
“good”
decomposition
table in normal form table in normal form
Databases C5: Normal Forms
Decomposition
Databases C5: Normal Forms
Decomposition Step 1: Projection
NAME LOAN_ID TYPE AMOUNT
➢ S1 = ς(𝑁𝐴𝑀𝐸,𝐿𝑂𝐴𝑁𝐼𝐷,𝑇𝑌𝑃𝐸,𝐴𝑀𝑂𝑈𝑁𝑇) 𝑅 Smith 101 mortgage 125000
Smith 102 credit card 25000
Green 103 credit card 12500
CUSTOMER_ID NAME LOAN_ID TYPE AMOUNT
Smith 389 mortgage 75000
1 Smith 101 mortgage 125000
1 Smith 102 credit card 25000
2 Green 103 credit card 12500
CUSTOMER_ID NAME
3 Smith 389 mortgage 75000
1 Smith
1 Smith
2 Green
➢ 𝑆2 = ς(𝐶𝑈𝑆𝑇𝑂𝑀𝐸𝑅𝐼𝐷,𝑁𝐴𝑀𝐸) 𝑅 3 Smith
Databases C5: Normal Forms
Decomposition Step 2: Join
CUSTOMER_ID NAME NAME LOAN_ID TYPE AMOUNT
1 Smith Smith 101 mortgage 125000
1 Smith Smith 102 credit card 25000
2 Green Green 103 credit card 12500
3 Smith Smith 389 mortgage 75000
CUSTOMER_ID NAME LOAN_ID TYPE AMOUNT
• Lossy decomposition 1 Smith 101 mortgage 125000
1 Smith 102 credit card 25000
S1 ⋈ S2 ⊇ 𝑅
1 Smith 389 mortgage 75000
3 Smith 101 mortgage 125000
3 Smith 102 credit card 25000
3 Smith 389 mortgage 75000
2 Green 103 credit card 12500
Databases C5: Normal Forms
Decomposition Step 1: Projection
CUSTOMER_ID LOAN_ID TYPE AMOUNT
➢ S1 = ς(𝐶𝑈𝑆𝑇𝑂𝑀𝐸𝑅𝐼𝐷,𝐿𝑂𝐴𝑁𝐼𝐷,𝑇𝑌𝑃𝐸,𝐴𝑀𝑂𝑈𝑁𝑇) 𝑅1 101 mortgage 125000
1 102 credit card 25000
2 103 credit card 12500
CUSTOMER_ID NAME LOAN_ID TYPE AMOUNT
3 389 mortgage 75000
1 Smith 101 mortgage 125000
1 Smith 102 credit card 25000
2 Green 103 credit card 12500
CUSTOMER_ID NAME
3 Smith 389 mortgage 75000
1 Smith
1 Smith
2 Green
➢ 𝑆2 = ς(𝐶𝑈𝑆𝑇𝑂𝑀𝐸𝑅𝐼𝐷,𝑁𝐴𝑀𝐸) 𝑅 3 Smith
Databases C5: Normal Forms
Decomposition Step 2: Join
CUSTOMER_ID NAME CUSTOMER_ID LOAN_ID TYPE AMOUNT
1 Smith 1 101 mortgage 125000
1 Smith 1 102 credit card 25000
2 Green 2 103 credit card 12500
3 Smith 3 389 mortgage 75000
CUSTOMER_ID NAME LOAN_ID TYPE AMOUNT
• Lossless decomposition 1 Smith 101 mortgage 125000
S1 ⋈ S2 = 𝑅 1 Smith 102 credit card 25000
3 Smith 389 mortgage 75000
2 Green 103 credit card 12500
Databases C5: Normal Forms
Decomposition
• lossy decompositions and lossless decompositions.
• Lossy: R → decompose(R): S1, S2 → recompose(S1,S2) ⊇ R
lossy =/= less data, (less is more!)
lossy = lost information
• Lossless R → decompose(R): S1, S2 → recompose(S1,S2) = R
Databases C5: Normal Forms
Decomposition
• Lossy
ς𝑅1 𝑅 ⋈ ς𝑅2 𝑅 ⊇ R
• Lossless
ς𝑅1 𝑅 ⋈ ς𝑅2 𝑅 = R
Databases C5: Normal Forms
Functional dependencies
Databases C5: Normal Forms
Functional dependencies
CUSTOMER_ID NAME X Y Z T
1 Smith X1 Y1 Z1 T1
1 Smith X1 Y2 Z1 T2
2 Green X2 Y2 Z2 T2
3 Smith X2 Y3 Z2 T3
X3 Y3 Z2 T4
• CUSTOMER_ID -> NAME
• X -> Z
• Z --/--> X
• X -> X
Databases C5: Normal Forms
Normal Forms
Databases C5: Normal Forms
NF1 NF2 NF3 BCNF NF4 NF5
Databases C5: Normal Forms
NF1
ATOMIC ATTRIBUTES
Databases C5: Normal Forms
NF1
• Atomic attributes
• No multi-valued attributes
• The domain of each attribute contains only atomic values and each
attribute contains only a value of its domain.
• A relational database is at least in NF1
Databases C5: Normal Forms
NF1
EMP_ID NAME EMAIL
1 Williams williams@gmail.com Optional:
Composed key
williams@yahoo.com Artificial key
2 Davis davis@gmail.com
davis@academy.com
3 Miller miller@gmail.com EMAIL_ID EMP_ID EMAIL
4 Stewart stewart@gmail.com 1 1 williams@gmail.com
office@academy.com 2 1 williams@yahoo.com
3 2 davis@gmail.com
4 2 davis@academy.com
5 3 miller@gmail.com
6 4 stewart@gmail.com
7 4 office@academy.com
Databases C5: Normal Forms
NF2
NO PARTIAL DEPENDENCIES
Databases C5: Normal Forms
NF2
• Tables in NF1
• No non-key attributes (not part of the key) that depend on a subset of
the attributes forming the key.
• There are no partial dependencies.
Databases C5: Normal Forms
Functional dependencies
X Y Z T X Y Z T
X1 Y1 Z1 T1 X1 Y1 Z1 …
X2 Y1 Z1 T2 X2 Y1 Z1 …
X2 Y2 Z2 T3 X2 Y2 Z2 …
X2 Y3 Z2 T3 X2 Y3 Z2 …
X2 Y3 Z2 T3 X2 Y3 Z2 …
• partial (X,Y) → Z
• Y →Z
Databases C5: Normal Forms
Functional dependencies
X Y Z T
X Y Z T … Y1 … T1
X1 Y1 … T1 … Y1 … T2
X2 Y1 … T2 … … … …
X2 Y2 … T3 … … … …
X2 Y3 … T3 … … … …
X2 Y3 … T3
X Y Z T
• total (X,Y) → T … … … …
X2 … … T2
• X -/-> T
X2 … … T3
• Y -/-> T
… … … …
… … … …
Databases C5: Normal Forms
Functional dependencies
AIRPORT_ID AIRPLANE_ID DEPARTURE AIRPLANE_MODEL BOARDING_GATE
1 101 30/03/20 17:00 Boeing 777 42
1 102 02/05/20 09:30 Airbus A320 50
2 201 06/08/20 10:45 Boeing 757 35
2 202 10/10/20 06:20 Airbus A320 10
1 101 06/04/20 16:35 Boeing 777 23
Databases C5: Normal Forms
NF2 dependencies
K1,K2 X, Y
K1 -> X
(K1, K2) -> Y
Databases C5: Normal Forms
NF2 dependencies
K1,K2 X, Y
K1 -> X
(K1, K2) -> Y
K1,K2 Y K1 X
Databases C5: Normal Forms
NF2 dependencies
K1,K2 X, Y
K1 -> X
(K1, K2) -> Y
K1 = AIRPLANE_ID
K2 = AIRPORT_ID, DEPARTURE K1,K2 Y K1 X
Y = BOARDING_GATE
X = AIRPLANE_MODEL
Databases C5: Normal Forms
AIRPORT_ID AIRPLANE_ID DEPARTURE AIRPLANE_MODEL BOARDING_GATE
1 101 30/03/20 17:00 Boeing 777 42
1 102 02/05/20 09:30 Airbus A320 50
2 201 06/08/20 10:45 Boeing 757 35
2 202 10/10/20 06:20 Airbus A320 10
1 101 06/04/20 16:35 Boeing 777 23
AIRPORT_ID AIRPLANE_ID DEPARTURE BOARDING_GATE AIRPLANE_ID AIRPLANE_MODEL
1 101 30/03/20 17:00 42 101 Boeing 777
1 102 02/05/20 09:30 50 102 Airbus A320
2 201 06/08/20 10:45 35 201 Boeing 757
2 202 10/10/20 06:20 10 202 Airbus A320
1 101 06/04/20 16:35 23
Databases C5: Normal Forms
NF3
NO TRANSITIVE DEPENDENCIES
Databases C5: Normal Forms
NF3
• Tables in NF2
• Non-key attributes (not part of the key) depend on the entire key and
only on the key.
• There are no transitive dependencies.
Databases C5: Normal Forms
AIRPORT_ID AIRPLANE_ID DEPARTURE MODEL CAPACITY REVISION_DATE BOARDING_GATE
1 101 30/03/20 17:00 Boeing 777 451 01/01/2021 42
1 102 02/05/20 09:30 Airbus A320 150 01/03/2020 50
2 201 06/08/20 10:45 Boeing 757 295 03/05/2020 35
2 202 10/10/20 06:20 Airbus A320 150 04/06/2021 10
1 101 06/04/20 16:35 Boeing 777 451 08/09/2020 23
AIRPORT_ID AIRPLANE_ID DEPARTURE BOARDING_GATE AIRPLANE_ID MODEL CAPACITY REVISION_DATE
1 101 30/03/20 17:00 42
101 Boeing 777 451 01/01/2021
1 102 02/05/20 09:30 50
102 Airbus A320 150 01/03/2020
2 201 06/08/20 10:45 35
201 Boeing 757 259 03/05/2020
2 202 10/10/20 06:20 10
202 Airbus A320 150 04/06/2021
1 101 06/04/20 16:35 23
Databases C5: Normal Forms
AIRPLANE_ID MODEL CAPACITY REVISION_DATE
101 Boeing 777 451 01/01/2021
102 Airbus A320 150 01/03/2020
201 Boeing 757 259 03/05/2020
202 Airbus A320 150 04/06/2021
Databases C5: Normal Forms
NF3 dependencies
K X, Y, Z
K -> X
X -> Y
Databases C5: Normal Forms
NF3 dependencies
K X, Y, Z
K -> X
X -> Y
K X, Z X Y
Databases C5: Normal Forms
NF3 dependencies
K X, Y, Z
K -> X
X -> Y
K = AIRPLANE_ID
X = AIRPLANE_MODEL K X, Z X Y
Y = CAPACITY
Z= REVISION_DATE
Databases C5: Normal Forms
AIRPLANE_ID MODEL CAPACITY REVISION_DATE
101 Boeing 777 451 01/01/2021
102 Airbus A320 150 01/03/2020
201 Boeing 757 259 03/05/2020
202 Airbus A320 150 04/06/2021
AIRPLANE_ID MODEL REVISION_DATE
MODEL CAPACITY
101 Boeing 777 01/01/2021
Boeing 777 451
102 Airbus A320 01/03/2020
Airbus A320 150
201 Boeing 757 03/05/2020
Boeing 757 259
202 Airbus A320 04/06/2021
Databases C5: Normal Forms