DBMS ch-4

Seshadripuram College Database Management Systems (DSC-4)
Unit 4
Data Normalization
NORMALIZATION:
Normalization is a process of efficiently organizing data in a database. Normalization is the
way by which a given set of relations are transformed to a new set of relations with a more solid
structure.
Normalization is a set of rules and techniques concerned with:

 Identifying relationship among attributes.
 Combining attributes to form relations.
 Combining relations to form a database.
 A normalized relational database provides several benefits:
 Eliminating of redundant data storage.
 Close modeling of real world entities, processes and their relationship.
 Structuring of data so that the model id flexible.
Relational Model constraints
Constraint: Constraints are certain properties that data and database should satisfy
with. They can be apply to column, row ,table or entire schema.
Some of the constraints are:
 Domain constraints
 Key constraints
 Integrity constraints
 Entity constraints
 Referential Constraints
Goodness of Relational DB Design
1.Semantics of attributes
It specifies how to interpret the attribute values stored in tuple
How to relate one tuple to another
Department of CS (BCA) Page 1

Ex: Each tuple in a relation should represent one entity instance
Only foreign keys should be used to refer to other entities
2. Redundant information in tuples and anomalies
The main goal of this is to minimize the storage space used by the base relations
by avoiding anomalies
Database Anomalies:
Anomalies are the problems in database relations that occur due to redundancy in
database. These anomalies affect the process of inserting, deleting and modifying
data in relations.
Different types of anomalies are
1. Insertion Anomaly: The insertion anomaly occurs when a new record is

inserted. We must insert the correct details so that they are consistent.
In the above table , until the new faculty member vasudha is assigned to teach
atleast one course, her details cannot be recorded.
2. Deletion Anomaly: The deletion anomaly occurs when a record is deleted

from the relation. This happens with unexpected error while data being
deleted, unseen drop off the data etc.

In the above table, all information about ashwini is lost when she temporarily
ceased to assign any course.
3. Modification/ Update Anomaly

The modification anomaly occurs when the record is updated in the relation.
This happens with modification of a attribute requires modification in all
records in which value occurs or with partial update etc.
In the above table, faculty 407 is having different addresses on different records as
shown.
Null Values in Tuples

If attributes does not apply to the tuples in the relation we end up with
many nulls in those tuples. This may lead to wastage of memory. It usually
happens when
 The attributes does not apply to the tuple

 The attribute value for the tuple is unknown
 The value is known but absent
Generation of spurious tuples: This happens when decomposing relations
where unwanted extra tuples are formed which leads to bad schema
Decomposition:
The decomposition of a relation schema R=(A1,A2……An)is its replacement by a set of relation
schemes (R1,R2……Rm),such that R1<=R FOR 1<=i<=m and R1 U R2 U R m=R.
A relation scheme R can be decomposed into a collection of relation schemes

{R1,R2,R3,….Rm)to eliminate some of the anomalies contained in the original relation R.
Example of decomposition
There are 2 subsets
R1(x,z) and R2(Y,Z)
If we union R1 and R2(R1 U R2),we get R
R=(x,y)U(y,z)
R=(x,y,z)
R(x,y,z)
GOALS OF DECOMPOSITION:
 Eliminate redundancy by decomposing a relation into several relations in a higher normal

form.
 It is important to check that a decomposition does not lead to bad design.
PROBLEM WITH DECOMPOSITION:
 Given instances of the decomposed relations,we may not be able to reconstruct the
corresponding instance of the original relation-information loss.

LOSSY DECOMPOSITION:
Additional tuples are obtained along with original tuples,although there are more tuples,this
leads to less information.Due to the loss of information,this decomposition is called lossy
decomposition.
LOSS LESS DECOMPOSITION:
A decomposition {R1,R2…..Rn} of a relation R is called a lossless decomposition for R if the

natural join of R1,R2…Rn produces exactly the relation R.
FUNCTIONAL DEPENDENCIES
A functional dependency (FD) is a constraint between two sets of attributes in a relation from a
database . Given a relation R, a set of attributes X in R is said to functionally determine another
attribute Y also in R,(written X->Y) if and only if each Xvalue is associated with precisely one
Y value . Here X the determinant set and Y the dependent attribute.
FUNCTIONAL DEPENDENCY:[DEFINITION]
A functional dependency is denoted by X-> Y between two sets of attributes X and Y that are
subsets of R specifies a constraint on the possible tuple that can form a relation state r of R.
The constraint is for any two tuples t1 and t2 in r if t1[x]=t2[x] then they have t1[y]=t2[y].This
means the value of X component of a tuple uniquely determines the value of component Y.
A functional dependency is an association between two attributes of the same relational database
table. One of the attributes is called the determinant and the other attribute is called the
dependent. For each value of the determinant there is associated one and only one value of the
dependent.
If A is the determinant and B is the dependent then we say that A functionally determines B and
graphically represent this can be written as A -> B. The symbols A ->B can also be expressed as B
is functionally dependent on A.

Example
Since for A = 3 there is associated more than one value of B.

Functional dependency can also be defined as follows:
An attribute in a relational model is said to be functionally dependent on another attribute
in the table if it can take only one value for a given value of the attribute upon which it is
functionally dependent.
IRREDUCIBLE FUNCTION DEPENDING SET
Functional depending set S is irreducible if the set has three following properties:
 Each right set of a functional dependency of S contains only one attribute.

 Each left set of a functional dependency of S is irreducible. It means that reducing any
one attribute from left set will change the content of S(S will lose some information).
 Reducing any functional dependency will change the content of S .sets of functional
dependencies (FD) with these properties are also called Canionical(or)Minimal.
PROPERTIES OF FUNCTIONAL DEPENDENCIES

Given that X,Y,and Z are sets of attributes in a relation R,one can derive several propreties of
functional dependencies .Amongthe most important are Armstrong’s axioms, which are used in
database normalisation:
 SUBSET PROPERTY(Axiom of reflexivity) :If Y is a subset of X,then X->Y.

 AUGMENTATION(Axiom of augmentation):If X->Y,then XZ->YZ.
 TRANSITIVITY(Axiom of transitivity) :if X->Y and Y->Z,then X->Z.
FROM THESE RULES ,WE CAN DERIVE THESE SECONDARY RULES:
 UNION :If X->Y and X->Z,then X->YZ

 DECOMPOSITION :If X->YZ,then X->Y and X->Z.
 PSEUDOTRANSITIVITY: If X->Y and WY->Z then XW->Z
Equivalent sets of functional dependencies are called covers of each other . Every set of
functional dependencies has a canonical cover.
EXAMPLE: Suppose that we specify the following set F of obvious functional dependencies on
the relation schema.
TYPES OF FUNCTIONAL DEPENDENCY:
1. Trivial functional dependency.

2. Non trivial functional dependency.
3. Single valued functional dependency.
4. Multi valued functional dependency.
5. Fully functional dependency.
6. Partial functional dependency.
7. Transitive functional dependency.
Trivial dependency:
o A trivial functional dependency is a functional dependency of an attribute on a superset

of itself, ie if Y is a proper set of X then X->Y
Ex:
{Employee ID,Employee Address}->{Employee Address } is trivial,as is{Employee
Address }-{Employee Address}.
Non trivial dependency: If a functional dependency X->Y holds true where Y is not a subset of
X then this dependency is called non trivial Functional dependency.
An employee table with three attributes: emp_id, emp_name, emp_address.

Ex:
The following functional dependencies are non-trivial:

emp_id -> emp_name (emp_name is not a subset of emp_id)

emp_id -> emp_address (emp_address is not a subset of emp_id)
Single valued functional dependency
If there is a functional dependency from X toY , a single attribute of X uniquely determines the
attribute in Y
AB
Multi-valued dependency:
A multi –valued dependency is a constraint according If there is a functional dependency from X

to Y , a single attribute of X uniquely determines set of attributes in Y
AB,C,D.
Fully functional dependency
If Y is functionally dependet on X ie XY, then any proper subset of X cannot uniquely

determines the attribute in Y.
 Not functionally dependent on any proper subasets of X .{Employee Address }has a

functional dependency on {Employee ID, Skill},but not a full functional dependency
because it is also dependent on {Employee ID}.
In this Y is fully dependent on X if it is functionally dependent on all of X ,not just on a subset .
EXAMPLE:
PNO-PLOCATION: FULL FD.
{SSN,PNO}-HOURS:FULL FD.
Partial functional dependency
If Y is functionally dependet on X ie XY, then any proper subset of X can uniquely determine
attribute in Y
EX: {PNO,PNAME}-PLOCATION: PARTIAL FD
Transitive dependency:
 A transitive functional dependency is an indirect functional dependency,one in which

 X->Z only by virtue of X->Y and Y->Z.

 A transitive dependency is a functional dependency which holds by virtue of

transitivity. A transitive dependency can occur only in a relation that has three or more
attributes.
What is Normalization?
Normalization is a process that is carried out to minimize the redundancies that

are present in data in relational databases. This process will mainly divide large
tables in to smaller tables with fewer redundancies. These smaller tables will be
related to each other through well defined relationships. In a well normalized
datadase, any alteration or modification in data will requires modifying only a
single table.
• A properly normalized database should have the following characteristics
– Scalar values in each fields
– Absence of redundancy.
– Minimal use of null values.
– Minimal loss of information.

NORMAL FORMS:
The data in the database can be considered to be in one of the number of ‘normal
forms’.Basically the normal form of the data indicates how much redundancy is in
that data.
The normal forms have a strict ordering:
1.First Normal Form(1NF)
2.Second Normal Form(2NF)
3.Third Normal Form(3NF)
4.BCNF
5.Fourth Normal Form(4NF)
6.Fifth Normal Form(5NF)
Properties of Normalization:
1.No data value should be duplicated in different rows unnecessarily.
2.Avalue must be specified(and required) for every attribute in a row.
3.Each relation should be self-contained.
4.When a row is added to a relation,other relations in the database should not be

affected.
5.A value of an attribute in a tuple may be changed independent of other tuples in

the relation and other relations.
Normalization Benefits:
>Facilitates data integration.
>Reduces data redundancy.
>Provides a robust architecture for retrieving and maintaining data.

>Compliments data modeling.
>Reduces the chances of data anomalies occurring.
Goals of Normalization(purposes):
There are two goals of the normalization process:
>Eliminating redundant data(for example,storing the same data in more than one
table) and
>Ensuring data dependencies make sense(only storing related data in a table).
>Reduce the potential for anomalies.
These are worthy goals as they reduce the amount of space a database consumes
and ensure that is logically stored.
Advantages of Normalization:
>Avoid redundancy(same data stored many times in same/different tables).
>All the update anomalies and does not have any loss of data or inefficient data
update process.
>In this,a well organized database where all the tables are inter-related maintain
integrity and consistency of data.
>All data are stored efficiently since there is no redundancy.
>The entire database system remains over time as the database grows with least
redundancy and much durability.
Disadvantages of Normalization:
>Maintaining more tables is a bit tough.
>Nested queries over multiple tables gats tricky.
First Normal Form(1NF)

First Normal Form(1NF or Minimal Form) is a normal form used in database

normalization.
A table is in 1NF if and only if all the key attributes have been defined and it
contains no repeating groups. A relation R is in 1NF if and only if it has only
single –valued attributes(atomic value/data).
*What is Atomic Data?
>It is smallest piece of data that cannot or should not be further divided.
There are basically two rules associated with First Normal Form.
Rule 1:A column with atomic data cannot have several values of the
same type of data in that column.
Example 1: Let us consider the below employee table:
In the above table, there are multiple values for children’s name and birthday
columns.
In the above table satisfies Rule1 but not First Normal Form .It violates Rule2.We
should apply Rule2 to the value table to be in First normal Form.

Rule2:A table with atomic data cannot have multiple columns with the
same type of data.
Example :Consider the below Employees table:
The above table violates Rule2 as there are multiple columns

CHILD1,CHILD2,CHILD1-BDAY and CHILD2-BDAY with same type of data.
SECOND NORMAL FORM(2NF):

A table is in second normal form(2NF) if and only if is in 1NF and every non key
attribute is fully functional dependent on the whole of the primary key (i.e. there
are no partial dependencies.
In slightly more formal terms terms: a 1NF and is in 2NF if and only if all its non-
prime attributes are functionally dependent on the whole of a candidate key.(A
non-prime attribute is one that does not belong to any candidate key.)
Example1:second normal form requires that any non-key field be dependent upon
the entire key. For example, consider the student courses table below, where
StudentID and courseID form a compound primary key.
The student Name field does not depend at all on courseId,but only on StudentID.
Course Location has dependency on StudentID ,but only on CourseID. This data
should be split into three tables as follows:
In the above table, Student ID is primary key and Name column is functionally
dependent on Student ID.

In the above table,CourseID is primary key and CourseLocation is functionally

dependent on CourseID.
In this example,grade was the only field dependent on the combination of

StudentID and CourseID.Here,StudentID and CourseId is composite key.
THIRD NORMAL FORM(3NF):

The Third Normal Form(3NF)is a normal form used in database normalization. A
relation R in 3NF if and only if it is in 2NF and every non-key column does not
depend on another non-key column. All non-prime attributes of R must be non-
transitively functionally dependent on a key of the relation.
3NF was originally defined by E.F.Codd in 1971. Codd’s definition states a table is
in 3NF if and only if both of the following conditions hold:
*The relation R(table) is in second normal form(2NF).

*Every non-prime attribute of R is non-transitively dependent(i.e. directly

dependent ) on every candidate key of R.
Consider the following example courseSections Table:
The professor is uniquely identified by the CourseID and Section of the course.
However,Professor Name depends on professorID and has no relation to CourseId
or Section. This data is properly stored as follows:
By splitting the data into two tables, the transitive dependency is removed.
BOYCE CODD NORMAL FORM(BCNF):

BOYCE CODD NORMAL CODD(BCNF)is a normal form used in database
normalization.It is a slightly stronger version of the third normal form(3NF).BCNF

Was developed in1974 by Raymond F.Boyce and Edgar F.Codd to address certain
types of anomaly not dealt with by 3NF as originally defined.Only in rare cases
does a 3NF table not meet the requirements of BCNF.
A relation is in BCNF, if and only if ,every determinant is a candidate key.The

difference between 3NF and BCNF is that for a functional dependency AB,3NF
allows this dependency in relation if B is primary key attribute and A is not a
candidate key, where as BCNF insists that for this dependency to remain in a
relation,A must be candidate key/primary key .Depending on what it’s functional
dependencies are,a 3NF table with two or more overlapping candidate key may or
may not be in BCNF.
Rules for BCNF
>A relation is in BCNF if every determinant is a candidate key.
>Recall that not all the determinants are keys.
>Those determinants that are keys we initially call candidate keys.
>Eventually,we select a single candidate key to be the key for the relation.
Ex:

Prime and non prime attribute from table
Can be decomposed like this

Ex-2
The relation FUNDS is in 1NF, 2NF, 3NF because no multiple values, no partial
dependencies and no transitive relation.
Steps to normalize a relation in BCNF
1.List all the determinants

2. check whether each determinant can act as a key(candidate key)
3. for any determinant that is not a candidate key, create a new rew relation from
the functional dependency. Retain the determinant in the original relation
In the above example:
FUNDS(Fund ID, Investment type, Manager)
1.Determinants are
i. Fund Id, Investment type
ii. Fund Id , Manager
iii. Manager
2. The determinant which act as keys are i and ii but not Manager because using
only that we can’t identify a tuple in relation.
3. create a new relation MANAGERS from functional dependency
We have retained the determinant Manager int the MANAGERS and each of
relations are in 1NF , 2NF , 3NF AND BCNF
FOURTH NORMAL FORM(4NF):

It is based on multi valued dependencies, which is violated when a relation is

undesirable multi-valued dependencies itcan be used to identify and decompose
such relations .
A relation scheme R is in 4NF with respect to a set of dependencies F if,for every

non trivial multi-valued dependency X->F, X is a super key for R. Under fourth
normal form a record should not contain two are more independemt multi valued
facts and should satisfy 3NF.
The fourth normal form removes all multi-valued dependencies.
>A relation is in 4NF if it is in BCNF and it contains no multi-valued

dependencies.
>Multi-valued dependency:A type of functional dependency where the determinant
Can determine more than one value.
FIFTH NORMAL FORM(5NF):

A relation R is in 5NF also called projection join normal form(PJ/NF) if and only
if every join dependency in R is implied by the candidate keys of R.
Fifth normal form(5NF) is a level of database normalization designed to reduce

redundancy in a relational databases by decomposing the related multiple
relationships. A table is said to be in the 5NF if and only if every join dependency
in it is implied by the candidate keys. Ie. The relations should be non additive and
lossless.

DBMS ch-4

Uploaded by

Document Informationclick to expand document information

Document Informationclick to expand document information

Copyright:

Available Formats

DBMS ch-4

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

DBMS ch-4

Uploaded by

Copyright:

Available Formats

Seshadripuram College Database Management Systems (DSC-4)

Normalization is a set of rules and techniques concerned with:

Relational Model constraints

Some of the constraints are:

Goodness of Relational DB Design

It specifies how to interpret the attribute values stored in tuple

How to relate one tuple to another

Department of CS (BCA) Page 1

Ex: Each tuple in a relation should represent one entity instance

Only foreign keys should be used to refer to other entities

2. Redundant information in tuples and anomalies

Different types of anomalies are

1. Insertion Anomaly: The insertion anomaly occurs when a new record is

2. Deletion Anomaly: The deletion anomaly occurs when a record is deleted

Department of CS (BCA) Page 2

3. Modification/ Update Anomaly

Null Values in Tuples

Department of CS (BCA) Page 3

 The attributes does not apply to the tuple

A relation scheme R can be decomposed into a collection of relation schemes

There are 2 subsets

R1(x,z) and R2(Y,Z)

If we union R1 and R2(R1 U R2),we get R

 Eliminate redundancy by decomposing a relation into several relations in a higher normal

PROBLEM WITH DECOMPOSITION:

Department of CS (BCA) Page 4

LOSS LESS DECOMPOSITION:

A decomposition {R1,R2…..Rn} of a relation R is called a lossless decomposition for R if the

Department of CS (BCA) Page 5

Since for A = 3 there is associated more than one value of B.

IRREDUCIBLE FUNCTION DEPENDING SET

 Each right set of a functional dependency of S contains only one attribute.

PROPERTIES OF FUNCTIONAL DEPENDENCIES

Department of CS (BCA) Page 6

 SUBSET PROPERTY(Axiom of reflexivity) :If Y is a subset of X,then X->Y.

FROM THESE RULES ,WE CAN DERIVE THESE SECONDARY RULES:

 UNION :If X->Y and X->Z,then X->YZ

TYPES OF FUNCTIONAL DEPENDENCY:

1. Trivial functional dependency.

o A trivial functional dependency is a functional dependency of an attribute on a superset

An employee table with three attributes: emp_id, emp_name, emp_address.

The following functional dependencies are non-trivial:

Department of CS (BCA) Page 7

emp_id -> emp_name (emp_name is not a subset of emp_id)

A multi –valued dependency is a constraint according If there is a functional dependency from X

Fully functional dependency

If Y is functionally dependet on X ie XY, then any proper subset of X cannot uniquely

 Not functionally dependent on any proper subasets of X .{Employee Address }has a

In this Y is fully dependent on X if it is functionally dependent on all of X ,not just on a subset .

PNO-PLOCATION: FULL FD.

Partial functional dependency

EX: {PNO,PNAME}-PLOCATION: PARTIAL FD

 A transitive functional dependency is an indirect functional dependency,one in which

Department of CS (BCA) Page 8

 A transitive dependency is a functional dependency which holds by virtue of

Normalization is a process that is carried out to minimize the redundancies that

• A properly normalized database should have the following characteristics

– Scalar values in each fields