[go: up one dir, main page]

0% found this document useful (0 votes)
27 views21 pages

DBMS - Unit Iii

Download as pdf or txt
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 21

UNIT III

FUNCTIONAL DEPENDENCY :

A functional dependency (FD) is a relationship between two attributes, typically between the primary key
and other non-key attributes within a table. A functional dependency denoted by X Y, is an association between
two sets of attribute X and Y. Here, X is called X the determinant, and Y is called the dependent.

For example, If we know the value of student roll number, we can obtain student address, marks etc. By
this, we say that student address and marks is functionally dependent on student roll number.

Student_roll_no  address, marks

Types of functional dependency

The following are types functional dependency in DBMS:

1. Fully-Functional Dependency 2. Partial Dependency 3. Transitive Dependency 4. Trivial Dependency


5. Multivalued Dependency

1. Full functional Dependency

A functional dependency XY is said to be a full functional dependency, if removal of any attribute A from
X, the dependency does not hold any more. i.e. Y is fully functional dependent on X, if it is Functionally
Dependent on X and not on any of the proper subset of X.
For example,
{Emp_num,Proj_num}  Hour
Is a full functional dependency. Here, Hour is the working time by an employee in a project.

2. Partial functional Dependency

A functional dependency XY is said to be a partial functional dependency, if after removal of any
attribute A from X, the dependency does not holds. i.e. Y is dependent on a proper subset of X. So X is
partially dependent on X. For example,
If {Emp_num,Proj_num} → Emp_name but also Emp_num → Emp_name then Emp_name is partially
functionally dependent on {Empl_num,Proj_num}.

3. Transitive dependency

A functional dependency is X  Z is said to be a transitive functional dependency if there exists the


functional dependencies X  Y and Y  Z. i.e. it is an indirect relationship. For example,
EMP_NUM  JOB_CLASS
is a transitive dependency which comes from EMP_NUM  JOB_CLASS and
JOB_CLASS  CHG_HOUR
1
4. Trivial functional dependency
A functional dependency X  Y is said to be a trivial functional dependency if Y is a subset of X.

For example,
{Emp_num,Emp_name}  Emp_num
is a trivial functional dependency since Emp_num is a subset of {Emp_num,Emp_name}.

5. Multivalued dependency

Multivalued dependency occurs in the situation where there are multiple independent multivalued attributes
in a single table. A multivalued dependency is a complete constraint between two sets of attributes in a
relation. It requires that certain tuples be present in a relation.

Example: Suppose there is a bike manufacturer company which produces two colors(white and black) of each
model every year.

BIKE_MODEL MANUF_YEAR COLOR

M2011 2008 White

M2001 2008 Black

M3001 2013 White

M3001 2013 Black

M4006 2017 White

M4006 2017 Black

Here columns COLOR and MANUF_YEAR are dependent on BIKE_MODEL and independent of each other.
In this case, these two columns can be called as multivalued dependent on BIKE_MODEL. The representation
of these dependencies is shown below:
BIKE_MODEL → → MANUF_YEAR BIKE_MODEL
→ → COLOR

2
This can be read as "BIKE_MODEL multidetermined MANUF_YEAR" and "BIKE_MODEL multidetermined
COLOR".

NORMALIZATION
Introduction:
The process of normalization was first developed by E. F. Codd. Database designed based on
E-R model may have some amount of inconsistency, uncertainty and redundancy.

Normalization is the process of efficiently organizing data in a database. There are two goals
of the normalization process: eliminating redundant data and ensuring data dependencies
make sense (only storing related data in a table).

Pitfalls in Relational Database Design:

Relational database design requires that we find a “good” collection of relation schemas. A
bad design may lead to Repetition of Information, Inability to represent certain information. A bad
or unformatted relational database design is called pitfalls.

Decomposition:

The process of taking a relation and splitting it up into multiple relations is called
decomposition. The purpose of decomposing relations is to avoid anomalies.

Data modification anomalies can be categorized into three types:

o Insertion Anomaly: Insertion Anomaly refers to when one cannot insert a new tuple into a
relationship due to lack of data.
o Deletion Anomaly: The delete anomaly refers to the situation where the deletion of data results
in the unintended loss of some other important data.
o Updatation Anomaly: The update anomaly is when an update of a single data value requires
multiple rows of data to be updated.

Normalization:
Normalization is defined as a step by step process of decomposing a complex relation into
simple and stable relations.

Advantages of Normalization:
 Decreased storage space
 Faster search performance
 Improved data integrity

3
 Reduces data redundancy in a database
 Remove Insert, Delete, Update anomalies during database activates
 Flexible structure Duplication of Data
Disadvantages of Normalization:
 On Normalizing the relations to higher normal forms i.e. 4NF, 5NF the performance
degrades.
 It is very time consuming and difficult process in normalizing relations of higher degree.

The Normal Forms


The database community has developed a series of guidelines for ensuring that databases
are normalized. These are referred to as normal forms and are numbered from one (the lowest
form of normalization, referred to as first normal form or 1NF) through five (fifth normal form or
5NF or PJNF). The different stages of Normalization are known as “Normal Form”.
Normalization process is based on functional dependency.

4
The Normal Form

NOTE: BCNF – Boyce Code Normal Form


PJNF – Projection Join Normal Form (5NF)
FIRST NORMAL FORM (1 NF)
Definition:

A relation schema R is in 1 NF if all the attributes of the relation R are atomic in nature.

First normal form: Eliminates repeating groups by putting each into a separate table and
connecting them with a one-to-many relationship. Make a separate table for each set of related
attributes and uniquely identify each record with a primary key.

RULES:

 Eliminate duplicative columns from the same table.


 Create separate tables for each group of related data and identify each row with a unique
column or set of columns (the Primary key)

Employee (unnormalized)

emp_no name dept_no dept_name skills


1 Kevin Jacobs 201 R&D C, Perl, Java
2 Barbara Jones 224 IT Linux, Mac
3 Jake Rivera 201 R&D DB2, Oracle, Java
Problem: Employee relation schema with lack of atomicity in skills (attribute)

Solution: Make a separate table for each set of attributes with a primary key

5
Employee (1NF)

emp_no name dept_no dept_name skills


1 Kevin Jacobs 201 R&D C
1 Kevin Jacobs 201 R&D Perl
1 Kevin Jacobs 201 R&D Java
2 Barbara Jones 224 IT Linux
2 Barbara Jones 224 IT Mac
3 Jake Rivera 201 R&D DB2
3 Jake Rivera 201 R&D Oracle
3 Jake Rivera 201 R&D Java
Benefits: Easier to query/sort the data, more scalable and each row can be identified for updating.

Disadvantage: This solution has the disadvantage of introducing redundancy in relation. So we


move on to 2NF.
SECOND NORMAL FORM (2 NF)
Definition:
A relation schema R is in 2NF if and only if it is in 1NF and every non-prime attribute in R is
fully functionally dependent on primary key.

In Second Normal Form: Each attribute must be functionally dependent on the primary key.
Functional dependence - the property of one or more attributes that uniquely determines the value
of other attributes. Any non-dependent attributes are moved into a smaller (subset) table.

RULES:

 The database must meet all the requirements of the first normal form.
 Redundant data across multiple rows of a table must be moved to a separate table. And
create relationships between the tables through foreign key. Example:

Employee (1NF)

emp_no name dept_no dept_name skills


1 Kevin Jacobs 201 R&D C
1 Kevin Jacobs 201 R&D Perl
1 Kevin Jacobs 201 R&D Java
2 Barbara 224 IT Linux
Jones
2 Barbara 224 IT Mac
Jones

6
3 Jake Rivera 201 R&D DB2
3 Jake Rivera 201 R&D Oracle
3 Jake Rivera 201 R&D Java

Name, dept_no, and dept_name are functionally dependent on emp_no.

(emp_no -> name, dept_no, dept_name)


Problem: Skills(attribute) is not functionally dependent on emp_no since it is not unique to each
emp_no.

Solution: Make a separate table like emp _ details (emp_no, name, dept_no, dept_name) and
skill_ details (emp_no, skills) and to set a primary key for emp_no attribute.

The resulting tables must be related to each other by use of foreign key.

Employee (2NF) Skills (2NF)

emp_noskills
1 C
emp na dep odept_ 1 Perl
_no me t_n name 1 Java
2 Linux
1 Kevi 201 R&D 2 Mac
n 3 DB2
Jaco 3 Oracle
bs 3 Java

2 Barb 224 IT
ara
Jone
s
3 Jake 201 R&D
Rive
ra
7
Now that we have removed the duplicative data from the employee table, and it have to be moved
to a separate table. We’ve satisfied the first rule of second normal form. We still need to use a
foreign key to tie the two tables together. The result table looks like.

Employee (2NF)
Emp_no Name Dept_no Dept_name Skills

Benefits:

 Decrease storage efficiency


 Less data repetition

Disadvantage:

This solution has the disadvantage of introducing redundancy in skills relation. So we move on to
3NF.

THIRD NORMAL FORM (3 NF)


Definition:

A table is in 3Nf if it is in 2NF and there are no transitive dependencies.


The Third Normal Form's objective is to remove data in a table that is not dependent on the
primary key. That is, any field that is dependent not only on the primary key but also on another
field must be moved to another table.

RULES:

 Already meet the requirements of both 1NF and 2NF.


 Remove columns that are not fully dependent upon the primary key.

Example:

8
Employee (2NF)

emp_no name dept_nodept_name


1 Kevin 201 R&D
Jacobs
2 Barbara 224 IT
Jones
3 Jake Rivera 201 R&D
Dept_no and dept_name are functionally dependent on emp_no however, department can be
considered a separate entity.

Employee (3NF) Department (3NF)

emp_nam dept_ dept_nodept_name


no e no 201 R&D
1 Kevi 201 224 IT
n
Jaco
bs
2 Barb 224
ara
Jone
s
3 Jake 201
River
a

9
Here #emp_no, #dept_no are primary key attribute.

Benefits: No extraneous(coming from the outside)data.


Boyce-Codd Normal Form (BCNF)
Boyce-Codd Normal Form or BCNF is an extension to the third normal form, and is also
known as 3.5 Normal Form.

Rules for BCNF

For a table to satisfy the Boyce-Codd Normal Form, it should satisfy the following two
conditions:

1. It should be in the Third Normal Form.


2. And, for any dependency A → B, A should be a super key.

The second point sounds a bit tricky, right? In simple words, it means, that for a dependency A
→ B, A cannot be a non-prime attribute, if B is a prime attribute.

Example
Below we have a college enrolment table with columns student_id, subject and professor.

As you can see, we have also added some sample data to the table. In
the table above:

• One student can enrol for multiple subjects. For example, student with student_id 101,
has opted for subjects - Java & C++
• For each subject, a professor is assigned to the student.
• And, there can be multiple professors teaching one subject like we have for Java.

What do you think should be the Primary Key?


Well, in the table above student_id, subject together form the primary key, because using
student_id and subject, we can find all the columns of the table.
10
One more important point to note here is, one professor teaches only one subject, but one
subject may have two different professors.
Hence, there is a dependency between subject and professor here, where subject depends on the
professor name.

This table satisfies the 1st Normal form because all the values are atomic, column names are
unique and all the values stored in a particular column are of same domain.
This table also satisfies the 2nd Normal Form as their is no Partial Dependency.
And, there is no Transitive Dependency, hence the table also satisfies the 3rd Normal Form.
But this table is not in Boyce-Codd Normal Form.

Why this table is not in BCNF?

In the table above, student_id, subject form primary key, which means subject column is a prime
attribute.
But, there is one more dependency, professor → subject.
And while subject is a prime attribute, professor is a non-prime attribute, which is not allowed by
BCNF.

How to satisfy BCNF?

To make this relation(table) satisfy BCNF, we will decompose this table into two tables,
student table and professor table.
Below we have the structure for both the tables.
Student Table

11
And now, this relation satisfy Boyce-Codd Normal Form. In the next tutorial we will learn about the
Fourth Normal Form.

Fourth Normal Form (4NF)


Fourth Normal Form comes into picture when Multi-valued Dependency occur in any relation.

Rules for 4th Normal Form

For a table to satisfy the Fourth Normal Form, it should satisfy the following two conditions:

1. It should be in the Boyce-Codd Normal Form.


2. And, the table should not have any Multi-valued Dependency.
Let's try to understand what multi-valued dependency is in the next section.

What is Multi-valued Dependency?

A table is said to have multi-valued dependency, if the following conditions are true,

1. For a dependency A → B, if for a single value of A, multiple value of B exists, then the table may
have multi-valued dependency.
2. Also, a table should have at-least 3 columns for it to have a multi-valued dependency.
3. And, for a relation R(A,B,C), if there is a multi-valued dependency between, A and B, then B and C
should be independent of each other.

If all these conditions are true for any relation(table), it is said to have multi-valued dependency.

Example

Below we have a college enrolment table with columns s_id, course and hobby.

As you can see in the table above, student with s_id 1 has opted for two courses, Science and Maths,
and has two hobbies, Cricket and Hockey.

12
You must be thinking what problem this can lead to, right?

Well the two records for student with s_id 1, will give rise to two more records, as shown below,
because for one student, two hobbies exists, hence along with both the courses, these hobbies should
be specified.

And, in the table above, there is no relationship between the columns course and hobby. They are
independent of each other.

So there is multi-value dependency, which leads to un-necessary repetition of data and other
anomalies as well.

How to satisfy 4th Normal Form?

To make the above relation satify the 4th normal form, we can decompose the table into 2 tables.

CourseOpted Table

13
Now this relation satisfies the fourth normal form.

14
15
MULTIVALUED DEPENDENCY (BCA)
o Multivalued dependency occurs when two attributes in a table are independent of each other but, both
depend on a third attribute.

o A multivalued dependency consists of at least two attributes that are dependent on a third attribute that's
why it always requires at least three attributes.

Example: Suppose there is a bike manufacturer company which produces two colors(white and black) of
each model every year.

Here columns COLOR and MANUF_YEAR are dependent on BIKE_MODEL and independent of each
other.
16
In this case, these two columns can be called as multivalued dependent on BIKE_MODEL. The
representation of these dependencies is shown below:

1. BIKE_MODEL → → MANUF_YEAR
2. BIKE_MODEL → → COLOR

This can be read as "BIKE_MODEL multidetermined MANUF_YEAR" and


"BIKE_MODEL multidetermined COLOR".

JOIN DEPENDENCIES (JD) (BCA)


 The anomalies of MVDs and are eliminated by join dependency (JD) and 5NF.

 A join dependency (JD) can be said to exist if the join of R1 and R2 over C is equal to relation R.

 Where, R1 and R2 are the decompositions R1(A, B, C), and R2 (C,D) of a given relations R (A, B,
C, D).

 R1 and R2 is a lossless decomposition of R.

 In other words, *(A, B, C, D), (C, D) will be a join dependency of R if the join of the join's
attributes is equal to relation R.

 *( R1, R2, R3 ....) indicates that relations R1, R2, R3 and soon are a join dependency (JD) of R.

 Relation R to satisfy a JD *( R1, R2,….,RN) is that  R = R1 U R2 U …..U RN.

 Thus, whenever we decompose a relation R into R1 = XUY and R2 = (R - Y) based on an MVD


X  Y that holds in relation R, the decomposition has lossless join property.

 Therefore, lossless-join dependency can be defined as a property of decomposition, which


ensures that no spurious tuples are generated when relations are returned through a natural
join operation.

Example 1

 relation PERSONS_ON_JOB_SKILLS, as shown in above fig .

 This relation can be decomposed into three relations namely, HAS_SKILL, NEEDS_SKILL cind
ASSIGNED_TO_JOBS.

17
 fig. the join dependencies of decomposed relations.

 if we join decomposed relations HAS_SKILL and NEEDS_SKILL, a relation


CAN_USE_JOB_SKILL is obtained, as shown in above Fig.

 This relation stores the data about persons who have skills applicable to a particular job. But,
each person who has a skill required for a particular job need not be assigned to that job.

Thus, redundant tuples (rows) that show unnecessary SKILL-TYPE and JOB combinations are
removed by joining with relation NEEDS_SKILL.

Relational Decomposition (BCA)


o When a relation in the relational model is not in appropriate normal form then the decomposition of a
relation is required.

o In a database, it breaks the table into multiple tables. o If the relation has no proper decomposition,

then it may lead to problems like loss of information.

o Decomposition is used to eliminate some of the problems of bad design like anomalies, inconsistencies,
and redundancy.
18
Types of Decomposition

Lossless Decomposition
o If the information is not lost from the relation that is decomposed, then the decomposition will be lossless.

o The lossless decomposition guarantees that the join of relations will result in the same relation as it was
decomposed. o The relation is said to be lossless decomposition if natural joins of all the decomposition give the
original relation.

Example:

EMPLOYEE_DEPARTMENT table:

19
Now, when these two relations are joined on the common column "EMP_ID", then the resultant relation
will look like:

Employee Department

EMP_ID EMP_NAME EMP_AGE EMP_CITY DEPT_ID DEPT_NAME

22 Denim 28 Mumbai 827 Sales

33 Alina 25 Delhi 438 Marketing

46 Stephan 30 Bangalore 869 Finance

52 Katherine 36 Mumbai 575 Production

20
60 Jack 40 Noida 678 Testing

Hence, the decomposition is Lossless join decomposition.

Dependency Preserving
o It is an important constraint of the database. o In the dependency preservation, at least one

decomposed table must satisfy every dependency.

o If a relation R is decomposed into relation R1 and R2, then the dependencies of R either must be a part
of R1 or R2 or must be derivable from the combination of functional dependencies of R1 and R2.
o For example, suppose there is a relation R (A, B, C, D) with functional dependency set (A->BC). The
relational R is decomposed into R1(ABC) and R2(AD) which is dependency preserving because FD A-
>BC is a part of relation R1(ABC).

21

You might also like