[go: up one dir, main page]

0% found this document useful (0 votes)
34 views13 pages

Normalization Lesson

Uploaded by

Zakariya Mohamed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views13 pages

Normalization Lesson

Uploaded by

Zakariya Mohamed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

What is Normalization?

Normalization is a database design technique that reduces data redundancy and eliminates
undesirable characteristics like Insertion, Update and Deletion Anomalies. Normalization rules
divides larger tables into smaller tables and links them using relationships. The purpose of
Normalization in SQL is to eliminate redundant (repetitive) data and ensure data is stored
logically.
Normalization is used for mainly two purposes,

 Eliminating redundant(useless) data.


 Ensuring data dependencies make sense i.e data is logically stored.

Problems Without Normalization


 If a table is not properly normalized and have data redundancy then it will not only eat
up extra memory space but will also make it difficult to handle and update the
database, without facing data loss. Insertion, Updation and Deletion Anomalies are
very frequent if database is not normalized.
To understand these anomalies let us take an example of a Student table.

rollno name branch hod office_tel

401 Akon CSE Mr. X 53337

402 Bkon CSE Mr. X 53337

403 Ckon CSE Mr. X 53337

404 Dkon CSE Mr. X 53337

 In the table above, we have data of 4 Computer Sci. students.


 As we can see, data for the fields branch, hod (Head of Department) and office_tel is
repeated for the students who are in the same branch in the college, this is Data
Redundancy.
Insertion Anomaly
Suppose for a new admission, until and unless a student opts for a branch, data of the student
cannot be inserted, or else we will have to set the branch information as NULL.
Also, if we have to insert data of 100 students of same branch, then the branch information
will be repeated for all those 100 students.
These scenarios are nothing but Insertion anomalies.

Updation Anomaly
What if Mr. X leaves the college? or is no longer the HOD of computer science department?
In that case all the student records will have to be updated, and if by mistake we miss any
record, it will lead to data inconsistency. This is Updation anomaly.

Deletion Anomaly
In our Student table, two different information’s are kept together, Student information and
Branch information. Hence, at the end of the academic year, if student records are deleted, we
will also lose the branch information. This is Deletion anomaly.
Normalization rules are divided into the following normal forms:

1. First Normal Form


2. Second Normal Form
3. Third Normal Form
4. BCNF
5. Fourth Normal Form

Database Normalization With Examples

Database Normalization Example can be easily understood with the help of a case study.

Assume, a video library maintains a database of movies rented out. Without any normalization
in database, all information is stored in one table as shown below. Let's understand
Normalization in database with tables example:
Table 1

Here you see Movies Rented column has multiple values. Now let's move into 1st Normal
Forms:

1NF (First Normal Form) Rules

 Each table cell should contain a single value.(atomic)


 Each record needs to be unique.

The above table in 1NF-

1NF Example

Table 1: In 1NF Form


What is a KEY?
A KEY is a value used to identify a record in a table uniquely. A KEY could be a single
column or combination of multiple columns
Note: Columns in a table that are NOT used to identify a record uniquely are called non-key
columns.
What is a Primary Key?

A primary is a single column value used to identify a database record uniquely.


It has following attributes
 A primary key cannot be NULL
 A primary key value must be unique
 The primary key values should rarely be changed
 The primary key must be given a value when a new record is inserted.
What is Composite Key?
A composite key is a primary key composed of multiple columns used to identify a record
uniquely
In our database, we have two people with the same name Robert Phil, but they live in
different places.

Hence, we require both Full Name and Address to identify a record uniquely. That is a
composite key.
Let's move into second normal form 2NF

2NF (Second Normal Form) Rules


 Rule 1- Be in 1NF
 Rule 2- Single Column Primary Key
It is clear that we can't move forward to make our simple database in 2nd Normalization form
unless we partition the table above.

Table 1

Table 2
We have divided our 1NF table into two tables viz. Table 1 and Table2. Table 1 contains
member information. Table 2 contains information on movies rented.
We have introduced a new column called Membership_id which is the primary key for table 1.
Records can be uniquely identified in Table 1 using membership id
Database - Foreign Key
In Table 2, Membership_ID is the Foreign Key
Foreign Key references the primary key of another Table! It helps connect your Tables
 A foreign key can have a different name from its primary key
 It ensures rows in one table have corresponding rows in another
 Unlike the Primary key, they do not have to be unique. Most often they aren't
 Foreign keys can be null even though primary keys can not
Why do you need a foreign key?
Suppose, a novice inserts a record in Table B such as

You will only be able to insert values into your foreign key that exist in the unique key in the
parent table. This helps in referential integrity.
The above problem can be overcome by declaring membership id from Table2 as foreign key
of membership id from Table1
Now, if somebody tries to insert a value in the membership id field that does not exist in the
parent table, an error will be shown!

NF (Third Normal Form) Rules


 Rule 1- Be in 2NF
 Rule 2- Has no transitive functional dependencies
To move our 2NF table into 3NF, we again need to again divide our table.
3NF Example
Third Normal Form is an upgrade to Second Normal Form. When a table is in the Second
Normal Form and has no transitive dependency, then it is in the Third Normal Form.
So let's use the same example, where we have 3 tables, Student, Subject and Score.
Student Table

student_id name reg_no branch address


10 Akon 07-WY CSE Kerala

11 Akon 08-WY IT Gujarat

12 Bkon 09-WY IT Rajasthan

Subject Table

subject_id subject_name teacher

1 Java Java Teacher

2 C++ C++ Teacher

3 Php Php Teacher

Score Table

score_id student_id subject_id marks

1 10 1 70

2 10 2 75

3 11 1 80

In the Score table, we need to store some more information, which is the exam name
and total marks, so let's add 2 more columns to the Score table.
score_id student_id subject_id marks exam_name total_marks

Requirements for Third Normal Form


For a table to be in the third normal form,

1. It should be in the Second Normal form.


2. And it should not have Transitive Dependency.

What is Transitive Dependency?


With exam_name and total_marks added to our Score table, it saves more data
now. Primary key for our Score table is a composite key, which means it's made
up of two attributes or columns → student_id + subject_id.
Our new column exam_name depends on both student and subject. For example,
a mechanical engineering student will have Workshop exam but a computer
science student won't. And for some subjects you have Prctical exams and for
some you don't. So we can say that exam_name is dependent on
both student_id and subject_id.
And what about our second new column total_marks? Does it depend on our
Score table's primary key?
Well, the column total_marks depends on exam_name as with exam type the
total score changes. For example, practicals are of less marks while theory exams
are of more marks.
But, exam_name is just another column in the score table. It is not a primary key
or even a part of the primary key, and total_marks depends on it.
This is Transitive Dependency. When a non-prime attribute depends on other
non-prime attributes rather than depending upon the prime attributes or primary
key.

How to remove Transitive Dependency?


Again the solution is very simple. Take out the
columns exam_name and total_marks from Score table and put them in
an Exam table and use the exam_id wherever required.
Score Table: In 3rd Normal Form

score_id student_id subject_id marks exam_id

The new Exam table

exam_id exam_name total_marks

1 Workshop 200

2 Mains 70

3 Practicals 30

Advantage of removing Transitive Dependency


The advantage of removing transitive dependency is,

 Amount of data duplication is reduced.


 Data integrity achieved.

Fourth Normal Form (4NF)


Fourth Normal Form comes into picture when Multi-valued Dependency occur in any
relation. In this tutorial we will learn about Multi-valued Dependency, how to remove it and
how to make any table satisfy the fourth normal form.
Rules for 4th Normal Form
For a table to satisfy the Fourth Normal Form, it should satisfy the following two conditions:

1. It should be in the Boyce-Codd Normal Form.


2. And, the table should not have any Multi-valued Dependency.

Let's try to understand what multi-valued dependency is in the next section.

What is Multi-valued Dependency?


A table is said to have multi-valued dependency, if the following conditions are true,

1. For a dependency A → B, if for a single value of A, multiple value of B exists, then


the table may have multi-valued dependency.
2. Also, a table should have at-least 3 columns for it to have a multi-valued dependency.
3. And, for a relation R(A,B,C), if there is a multi-valued dependency between, A and B,
then B and C should be independent of each other.

If all these conditions are true for any relation(table), it is said to have multi-valued
dependency.
Time for an Example
Below we have a college enrolment table with columns s_id, course and hobby.

s_id course hobby

1 Science Cricket

1 Maths Hockey

2 C# Cricket

2 Php Hockey

As you can see in the table above, student with s_id 1 has opted for two
courses, Science and Maths, and has two hobbies, Cricket and Hockey.
You must be thinking what problem this can lead to, right?
Well the two records for student with s_id 1, will give rise to two more records, as shown
below, because for one student, two hobbies exists, hence along with both the courses, these
hobbies should be specified.
s_id course hobby

1 Science Cricket

1 Maths Hockey

1 Science Hockey

1 Maths Cricket

And, in the table above, there is no relationship between the columns course and hobby. They
are independent of each other.
So there is multi-value dependency, which leads to un-necessary repetition of data and other
anomalies as well.

How to satisfy 4th Normal Form?


To make the above relation satify the 4th normal form, we can decompose the table into 2
tables.
Course Opted Table

s_id course

1 Science

1 Maths

2 C#

2 Php

And, Hobbies Table,

s_id hobby

1 Cricket

1 Hockey

2 Cricket
2 Hockey

Now this relation satisfies the fourth normal form.


A table can also have functional dependency along with multi-valued dependency. In that case,
the functionally dependent columns are moved in a separate table and the multi-valued
dependent columns are moved to separate tables.
If you design your database carefully, you can easily avoid these issues.

You might also like