(CAM204-1C) Handling Data Using SQL
Unit 3: Functional Dependencies and Normalization for Relational Databases
1. Functional Dependencies
2. Concepts of Dependency, Transitive Dependency and
Armstrong Axioms
3. Normal Forms Based on primary Keys,
4. General Definitions of Second and Third Normal Forms,
Boyce-Codd Normal Form
5. Example of Normalization
Functional Dependencies
• Functional Dependency avoids data redundancy. That is, same data do
not repeat at multiple locations in that database
• It is a constraint that determines the relation of one attribute to another
attribute in a Database Management System (DBMS)
• It helps you to maintain the quality of data in the database
• It helps you to defined meanings and constraints of databases
• It helps you to identify bad designs
• It helps you to find the facts regarding the database design
Four types of functional dependency are
1) Multivalued
2) Trivial
3) Non-trivial
4) Transitive
The functional dependency is a relationship that exists between two attributes.
It typically exists between the primary key and non-key attribute within a table.
For any relation R, attribute Y is functionally dependent on attribute X (usually
the PK), if for every valid instance of X, that value of X uniquely determines the
value of Y.
X → Y (X determines Y)
The left side of FD is known as a determinant the right side of the production is
known as a dependent.
Assume we have an employee table with attributes: Emp_Id, Emp_Name,
Emp_Address.
Here Emp_Id attribute can uniquely identify the Emp_Name attribute of
employee table because if we know the Emp_Id, we can tell that employee name
associated with it.
Functional dependency can be written as:
Emp_Id → Emp_Name
We can say that Emp_Name is functionally dependent on Emp_Id.
If we can check functional dependency A—>B then always all A value get one B
value Which can be never change. IF Value can be change then there is no
functional dependency.
Types of Functional dependency
Trivial functional dependency
o A → B has trivial functional dependency if B is a subset of A. (ACB)
o The following dependencies are also trivial like: A → A, B → B
Non-trivial functional dependency
o A → B has a non-trivial functional dependency if B is not a subset of
A.(ACB)
o When A Intersection B is NULL, then A → B is called as complete non-
trivial.
A transitive Functional Dependency : it is type of functional dependency which
happens when it is indirectly formed by two functional dependencies.
{Company} -> {CEO} (if we know the compay, we know its CEO's name)
{CEO } -> {Age} If we know the CEO, we know the Age
Therefore according to the rule of rule of transitive dependency:
{ Company} -> {Age} should hold, that makes sense because if we know the
company name, we can know his age.
Multivalued dependency occurs in the situation where there are multiple
independent multivalued attributes in a single table.
A multivalued dependency is a complete constraint between two sets of
attributes in a relation.
Example: car(car_model,year,colour) year and color are independent of each
other but dependent on car_model.
In this example, these two columns are said to be multivalue dependent on
car_model.
This dependence can be represented like this:
car_model -> year
car_model-> colour
Armstrong’s Axioms in Functional Dependency
• Axioms is a set of inference rules used to infer all the functional
dependencies on a relational database.
• Introduced by William W. Armstrong, that is used to test the logical
implication of functional dependencies.
• The Armstrong's axioms are the basic inference rule.
• Armstrong's axioms are used to conclude functional dependencies on a
relational database.
• Using the inference rule, we can derive additional functional dependency
from the initial set.
• If F is a set of functional dependencies then the closure of F, denoted as ,
F+,is the set of all functional dependencies logically implied by F.
• Armstrong’s Axioms are a set of rules, that when applied repeatedly,
generates a closure of functional dependencies.
1. Reflexivity Rule
In the reflexive rule, if Y is a subset of X, then X determines Y.
If X ⊇ Y then X → Y
X = {a, b, c, d, e}
Y = {a, b, c}
2. Augmentation Rule
The augmentation is also called as a partial dependency. In augmentation, if
X determines Y, then XZ determines YZ for any Z.
If X → Y then XZ → YZ
For R(ABCD), if A → B then AC → BC
3. Transitive Rule
In the transitive rule, if X determines Y and Y determine Z, then X must also
determine Z.
If X → Y and Y → Z then X → Z
4. Union Rule
Union rule says, if X determines Y and X determines Z, then X must also
determine Y and Z.
If X → Y and X → Z then X → YZ
5. Decomposition Rule
Decomposition rule is also known as project rule. It is the reverse of union
rule.
This Rule says, if X determines Y and Z, then X determines Y and X
determines Z separately.
If X → YZ then X → Y and X → Z
1. X → YZ (given)
2. YZ → Y (using IR1 Rule)
3. X → Y (using IR3 on 1 and 2)
6. Pseudo transitive Rule
In Pseudo transitive Rule, if X determines Y and YZ determines W, then XZ
determines W.
If X → Y and YZ → W then XZ → W
Normalization
o Normalization is the process of organizing the data in the database.
o Normalization is used to minimize the redundancy from a relation or set
of relations. It is also used to eliminate the undesirable characteristics like
Insertion, Update and Deletion Anomalies.
o Normalization divides the larger table into the smaller table and links
them using relationship.
o The normal form is used to reduce redundancy from the database table.
o Normalization is the process of splitting relations into well structured
relations that allow users to insert, delete, and update tuples without
introducing database.
o Without normalization many problems can occur when trying to load an
integrated conceptual model into the DBSM.
o These problems arise from relations that are generated directly from user
views are called anomalies.
How to reduce data redundancy?
o Normalization is a process of reducing redundancies of data in a database.
Normalization is a technique that is used when designing and redesigning
a database. Normalization is a process or set of guidelines used to
optimally design a database to reduce redundant data.
Types of Normal Forms
First Normal Form (1NF)
o A relation will be 1NF if it contains an atomic value.
o It states that an attribute of a table cannot hold multiple values. It must
hold only single-valued attribute.
o First normal form disallows the multi-valued attribute, composite
attribute, and their combinations.
Example: Relation EMPLOYEE is not in 1NF because of multi-valued attribute
EMP_PHONE.
EMPLOYEE table:
EMP_ID EMP_NAME EMP_PHONE EMP_STATE
14 John 7272826385, UP
9064738238
20 Harry 8574783832 Bihar
12 Sam 7390372389, Punjab
8589830302
The decomposition of the EMPLOYEE table into 1NF has been shown below
EMP_ID EMP_NAME EMP_PHONE EMP_STATE
14 John 7272826385 UP
14 John 9064738238 UP
20 Harry 8574783832 Bihar
12 Sam 7390372389 Punjab
12 Sam 8589830302 Punjab
Second Normal Form (2NF)
In the 2NF, relational must be in 1NF.
In the second normal form, all non-key attributes are fully functional
dependent on the primary key
Example: Let's assume, a school can store the data of teachers and the
subjects they teach. In a school, a teacher can teach more than one
subject.
TEACHER table
TEACHER_ID SUBJECT TEACHER_AGE
25 Chemistry 30
25 Biology 30
47 English 35
83 Math 38
83 Computer 38
To convert the given table into 2NF, we decompose it into two tables:
TEACHER_DETAIL table:
TEACHER_ID TEACHER_AGE
25 30
47 35
83 38
TEACHER_SUBJECT table:
TEACHER_ID SUBJECT
25 Chemistry
25 Biology
47 English
83 Math
83 Computer
Third Normal Form (3NF)
o A relation will be in 3NF if it is in 2NF and not contain any transitive
partial dependency.
o 3NF is used to reduce the data duplication. It is also used to achieve the
data integrity.
o If there is no transitive dependency for non-prime attributes, then the
relation must be in third normal form.
A relation is in third normal form if it holds atleast one of the following
conditions for every non-trivial function dependency X → Y.
1. X is a super key.
2. Y is a prime attribute, i.e., each element of Y is part of some candidate
key.
Example:
EMPLO YEE_DETAIL table:
EMP_ID EMP_NAME EMP_ZIP EMP_STATE EMP_CITY
222 Harry 201010 UP Noida
333 Stephan 02228 US Boston
444 Lan 60007 US Chicago
555 Katharine 06389 UK Norwich
666 John 462007 MP Bhopal
Super key in the table above:
1. {EMP_ID}, {EMP_ID, EMP_NAME}, {EMP_ID, EMP_NAME, EMP_ZIP}....so
on
Candidate key: {EMP_ID}
Non-prime attributes: In the given table, all attributes except EMP_ID
are non-prime.
Here, EMP_STATE & EMP_CITY dependent on EMP_ZIP and EMP_ZIP
dependent on EMP_ID. The non-prime attributes (EMP_STATE,
EMP_CITY) transitively dependent on super key(EMP_ID). It violates the
rule of third normal form.
That's why we need to move the EMP_CITY and EMP_STATE to the new
<EMPLOYEE_ZIP> table, with EMP_ZIP as a Primary key.
EMPLOYEE table:
EMP_ID EMP_NAME EMP_ZIP
222 Harry 201010
333 Stephan 02228
444 Lan 60007
555 Katharine 06389
666 John 462007
EMPLOYEE_ZIP table:
EMP_ZIP EMP_STATE EMP_CITY
201010 UP Noida
02228 US Boston
60007 US Chicago
06389 UK Norwich
462007 MP Bhopal
Boyce Codd normal form (BCNF)
o BCNF is the advance version of 3NF. It is stricter than 3NF.
o A table is in BCNF if every functional dependency X → Y, X is the super
key of the table.
o For BCNF, the table should be in 3NF, and for every FD, LHS is super key.
Example: Let's assume there is a company where employees work in more
than one department.
EMPLOYEE table:
EMP_ID EMP_COUNTRY EMP_DEPT DEPT_TYPE EMP_DEPT_NO
264 India Designing D394 283
264 India Testing D394 300
364 UK Stores D283 232
364 UK Developing D283 549
In the above table Functional dependencies are as follows:
1. EMP_ID → EMP_COUNTRY
2. EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}
Candidate key: {EMP-ID, EMP-DEPT}
The table is not in BCNF because neither EMP_DEPT nor EMP_ID alone are
keys.
To convert the given table into BCNF, we decompose it into three tables:
EMP_COUNTRY table:
EMP_ID EMP_COUNTRY
264 India
264 India
EMP_DEPT table:
EMP_DEPT DEPT_TYPE EMP_DEPT_NO
Designing D394 283
Testing D394 300
Stores D283 232
Developing D283 549
EMP_DEPT_MAPPING table:
EMP_ID EMP_DEPT
D394 283
D394 300
D283 232
D283 549
Functional dependencies:
1. EMP_ID → EMP_COUNTRY
2. EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}
Candidate keys:
For the first table: EMP_ID
For the second table: EMP_DEPT
For the third table: {EMP_ID, EMP_DEPT}
Now, this is in BCNF because left side part of both the functional dependencies
is a key.