NORMALIZATION
NORMALIZATION
Anomalies in DBMS
There are three types of anomalies that occur when the database is not normalized. These
are – Insertion, update and deletion anomaly. Let’s take an example to understand this.
emp_i
d emp_name emp_address emp_dept
101 Rick Delhi D001
101 Rick Delhi D002
123 Maggie Agra D890
166 Glenn Chennai D900
166 Glenn Chennai D004
Example: Suppose a manufacturing company stores the employee details in a table named
employee that has four attributes: emp_id for storing employee’s id, emp_name for storing
employee’s name, emp_address for storing employee’s address and emp_dept for storing the
department details in which the employee works. At some point of time the table looks like this:
The above table is not normalized. We will see the problems that we face when a table is not
normalized.
Update anomaly: In the above table we have two rows for employee Rick as he belongs to two
departments of the company. If we want to update the address of Rick then we have to update the
same in two rows or the data will become inconsistent. If somehow, the correct address gets
updated in one department but not in other then as per the database, Rick would be having two
different addresses, which is not correct and would lead to inconsistent data.
Insert anomaly: Suppose a new employee joins the company, who is under training and
currently not assigned to any department then we would not be able to insert the data into the
table if emp_dept field doesn’t allow nulls.
Delete anomaly: Suppose, if at a point of time the company closes the department D890 then
deleting the rows that are having emp_dept as D890 would also delete the information of
employee Maggie since she is assigned only to this department.
To overcome these anomalies we need to normalize the data. In the next section we will discuss
about normalization.
Normal Forms:
Here are the most commonly used normal forms:
First normal form(1NF)
Second normal form(2NF)
Third normal form(3NF)
Boyce & Codd normal form (BCNF)
Employee2 Table
teacher_id subject
1 ABC
1 ABC
2 PQR
3 XYZ
3 XYZ
Now, the above tables comply with the Second Normal Form (2NF).
3. Third Normal Form (3NF)
Third Normal Form (3NF) is used to minimize the transitive redundancy.
In 3NF, the table is required in 2NF.
While using the 2NF table, there should not be any transitive partial dependency.
3NF reduces the duplication of data and also achieves the data integrity.
In the above <Employee> table, EId is a primary key but City, State depends upon Zip
code.
The dependency between Zip and other fields is called Transitive Dependency.
Therefore we apply 3NF. So, we need to move the city and state to the new
<Employee_Table2> table, with Zip as a Primary key.
<Employee_Table1> Table
EId Ename DOB Zip
001 ABC 10/05/1990 411038
002 XYZ 11/05/1988 400007
<Employee_Table2> Table
City State Zip
Pune Maharashtra 411038
Mumbai Maharashtra 400007
The advantage of removing transitive dependency is, it reduces the amount of data
dependencies and achieves the data integrity.
In the above example, using with the 3NF, there is no redundancy of data while inserting
the new records.
The City, State and Zip code will be stored in the separate table. And therefore the
updation becomes more easier because of no data redundancy.