DDBMS
Normalization
Normalization
■ Normalization is a process that “improves” a database
design by generating relations that are of higher
normal forms.
■ The objective of normalization:
■ “to create relations where every dependency is on the
key, the whole key, and nothing but the key”.
Reasons for
Normalization
■ There are three main reasons to
normalize a database.
– The first is to minimize duplicate data,
– the second is to minimize or avoid data
modification issues,
– and the third is to simplify queries
Normalization
1NF a relation in BCNF, is also
2NF in 3NF
3NF a relation in 3NF is also in
2NF
BCNF
a relation in 2NF is also in
1NF
Normalization
■ A design that has a lower normal form than another
design has more redundancy.
■ Uncontrolled redundancy can lead to data integrity
problems.
■ The concept of functional dependency
Functional Dependencies
Functional Dependencies
We say an attribute, B, has a functional dependency on
another attribute, A, if for any two records, which have
the same value for A, then the values for B in these two
records must be the same. We illustrate this as:
AB
Example: Suppose we keep track of employee email
addresses, and we only track one email address for each
employee. Suppose each employee is identified by their
unique employee number. We say there is a functional
dependency of email address on employee number:
employee number email address
Functional Dependencies
EmpNum EmpEmail EmpFname EmpLname
123 jdoe@abc.com John Doe
456 psmith@abc.com Peter Smith
555 alee1@abc.com Alan Lee
633 pdoe@abc.com Peter Doe
787 alee2@abc.com Alan Lee
If EmpNum is the PK then the FDs:
EmpNum EmpEmail
EmpNum EmpFname
EmpNum EmpLname
must exist.
Functional Dependencies
EmpNum EmpEmail
EmpNum EmpFname 3 different ways
EmpNum EmpLname you might see FDs
depicted
EmpEmail
EmpNum EmpFname
EmpLname
EmpNum EmpEmail EmpFname EmpLname
Determinant
Functional Dependency
EmpNum EmpEmail
Attribute on the LHS is known as the determinant
• EmpNum is a determinant of EmpEmail
Transitive dependency
Transitive dependency
Consider attributes A, B, and C, and where
A B and B C.
Functional dependencies are transitive, which
means that we also have the functional dependency
AC
We say that C is transitively dependent on A
through B.
Transitive dependency
EmpNum DeptNum
EmpNum EmpEmail DeptNum DeptNname
DeptNum DeptName
EmpNum EmpEmail DeptNum DeptNname
DeptName is transitively dependent on EmpNum via DeptNum
EmpNum DeptName
Partial dependency
A partial dependency exists when an attribute B is
functionally dependent on an attribute A, and A is a
component of a multipart candidate key.
InvNum LineNum Qty InvDate
Candidate keys: {InvNum, LineNum} InvDate is
partially dependent on {InvNum, LineNum} as
InvNum is a determinant of InvDate and InvNum is
part of a candidate key
Take the following table.
StudentID is the primary key.
Is it 1NF?
No. There are repeating groups
(subject, subjectcost, grade)
How can you make it 1NF?
Create new rows so each cell
contains only one value
But now look – is the studentID
primary key still valid?
No – the studentID no longer
uniquely identifies each row
You now need to declare studentID and
subject together to uniquely identify
each row.
So the new key is StudentID and Subject.
So. We now have 1NF.
Is it 2NF?
Studentname and address are
dependent on studentID (which is
part of the key)
This is good.
But they are not
dependent on Subject (the
other part of the key)
And 2NF requires…
All non-key fields are
dependent on the
ENTIRE key (studentID +
subject)
So it’s not 2NF
How can we fix it?
Make new tables
■ Make a new table for each primary key field
■ Give each new table its own primary key
■ Move columns from the original table to the new
table that matches their primary key…
Step 1
STUDENT TABLE (key = StudentID)
Step 2
STUDENT TABLE (key = StudentID)
SUBJECTS TABLE (key = Subject)
Step 3
STUDENT TABLE (key = StudentID)
SUBJECTS TABLE (key = Subject)
RESULTS TABLE (key = StudentID+Subject)
Step 3
STUDENT TABLE (key = StudentID)
SUBJECTS TABLE (key = Subject)
RESULTS TABLE (key = StudentID+Subject)
Step 4 - relationships
STUDENT TABLE (key = StudentID)
SUBJECTS TABLE (key = Subject)
RESULTS TABLE (key = StudentID+Subject)
Step 4 - cardinality
STUDENT TABLE (key = StudentID)
1 Each student can only
appear ONCE in the SUBJECTS TABLE (key = Subject)
student table
RESULTS TABLE (key = StudentID+Subject)
Step 4 - cardinality
STUDENT TABLE (key = StudentID)
1
SUBJECTS TABLE (key = Subject)
Each subject can only
appear ONCE in the
subjects table
RESULTS TABLE (key = StudentID+Subject)
Step 4 - cardinality
STUDENT TABLE (key = StudentID)
1
SUBJECTS TABLE (key = Subject)
1
A subject can be listed
MANY times in the
results table (for
different students)
8
RESULTS TABLE (key = StudentID+Subject)
Step 4 - cardinality
STUDENT TABLE (key = StudentID)
1
SUBJECTS TABLE (key = Subject)
A student can be listed
MANY times in the 1
results table (for
different subjects)
8
RESULTS TABLE (key = StudentID+Subject)
A 2NF check
STUDENT TABLE (key = StudentID)
1
SUBJECTS TABLE (key = Subject)
SubjectCost is only
dependent on the
8
primary key,
Subject
RESULTS TABLE (key = StudentID+Subject)
A 2NF check
STUDENT TABLE (key = StudentID)
1
SUBJECTS TABLE (key = Subject)
1
8
Grade is only dependent
on the primary key
(studentID + subject)
RESULTS TABLE (key = StudentID+Subject)
A 2NF check
STUDENT TABLE (key = StudentID)
1 Name, Address are only
dependent on the SUBJECTS TABLE (key = Subject)
primary key 1
(StudentID)
8
RESULTS TABLE (key = StudentID+Subject)
STUDENT TABLE (key = StudentID)
So it is
1
SUBJECTS TABLE (key = Subject)
2NF! 1
But is it 3NF?
8
RESULTS TABLE (key = StudentID+Subject)
A 3NF check
STUDENT TABLE (key = StudentID)
1
What? SUBJECTS TABLE (key = Subject)
1
8
RESULTS TABLE (key = StudentID+Subject)
A 3NF check
STUDENT TABLE (key = StudentID)
1
HouseName is
SUBJECTS TABLE (key = Subject)
dependent on both
1
StudentID +
HouseColour
8
RESULTS TABLE (key = StudentID+Subject)
A 3NF check
STUDENT TABLE (key = StudentID)
1
Or HouseColour is
SUBJECTS TABLE (key = Subject)
dependent on both
1
StudentID +
HouseName
8
RESULTS TABLE (key = StudentID+Subject)
A 3NF check
STUDENT TABLE (key = StudentID)
1
But either way,
non-key fields are SUBJECTS TABLE (key = Subject)
dependent on MORE 1
THAN THE PRIMARY
KEY (studentID)
8
RESULTS TABLE (key = StudentID+Subject)
A 3NF check
STUDENT TABLE (key = StudentID)
1
And 3NF says that
non-key fields must SUBJECTS TABLE (key = Subject)
depend on nothing 1
but the key
8
RESULTS TABLE (key = StudentID+Subject)
A 3NF check
STUDENT TABLE (key = StudentID)
What to do?
1
SUBJECTS TABLE (key = Subject)
1
8
RESULTS TABLE (key = StudentID+Subject)
Again, carve off the offending
fields
SUBJECTS TABLE (key = Subject)
8
RESULTS TABLE (key = StudentID+Subject)
A 3NF fix
SUBJECTS TABLE (key = Subject)
8
RESULTS TABLE (key = StudentID+Subject)
A 3NF fix
8
1
1
SUBJECTS TABLE (key = Subject)
8
RESULTS TABLE (key = StudentID+Subject)
A 3NF!
8
1
1
8
RESULTS TABLE (key = StudentID+Subject)
SUBJECTS TABLE (key = Subject)
Finally
Before…
After…
1
8
1
1
8
SUBJECTS TABLE (key = Subject)
RESULTS TABLE (key = StudentID+Subject)
Class Assignment
Convert the given relation into 3rd Normal Form
SalesStaff
EmployeeI SalesPers SalesOffic OfficeNumb Customer Custome Custome
D on e er 1 r2 r3
1003 Khawaja FSD 0092-123456 HP Apple
1004 Kashif LHE 0092-123457 DELL
1005 Khan KHI 0092-123458 IBM DELL
1006 Nawaz LHE 0092-123459 HP APPLE