DBMS - Unit Iii
DBMS - Unit Iii
DBMS - Unit Iii
FUNCTIONAL DEPENDENCY :
A functional dependency (FD) is a relationship between two attributes, typically between the primary key
and other non-key attributes within a table. A functional dependency denoted by X Y, is an association between
two sets of attribute X and Y. Here, X is called X the determinant, and Y is called the dependent.
For example, If we know the value of student roll number, we can obtain student address, marks etc. By
this, we say that student address and marks is functionally dependent on student roll number.
A functional dependency XY is said to be a full functional dependency, if removal of any attribute A from
X, the dependency does not hold any more. i.e. Y is fully functional dependent on X, if it is Functionally
Dependent on X and not on any of the proper subset of X.
For example,
{Emp_num,Proj_num} Hour
Is a full functional dependency. Here, Hour is the working time by an employee in a project.
A functional dependency XY is said to be a partial functional dependency, if after removal of any
attribute A from X, the dependency does not holds. i.e. Y is dependent on a proper subset of X. So X is
partially dependent on X. For example,
If {Emp_num,Proj_num} → Emp_name but also Emp_num → Emp_name then Emp_name is partially
functionally dependent on {Empl_num,Proj_num}.
3. Transitive dependency
For example,
{Emp_num,Emp_name} Emp_num
is a trivial functional dependency since Emp_num is a subset of {Emp_num,Emp_name}.
5. Multivalued dependency
Multivalued dependency occurs in the situation where there are multiple independent multivalued attributes
in a single table. A multivalued dependency is a complete constraint between two sets of attributes in a
relation. It requires that certain tuples be present in a relation.
Example: Suppose there is a bike manufacturer company which produces two colors(white and black) of each
model every year.
Here columns COLOR and MANUF_YEAR are dependent on BIKE_MODEL and independent of each other.
In this case, these two columns can be called as multivalued dependent on BIKE_MODEL. The representation
of these dependencies is shown below:
BIKE_MODEL → → MANUF_YEAR BIKE_MODEL
→ → COLOR
2
This can be read as "BIKE_MODEL multidetermined MANUF_YEAR" and "BIKE_MODEL multidetermined
COLOR".
NORMALIZATION
Introduction:
The process of normalization was first developed by E. F. Codd. Database designed based on
E-R model may have some amount of inconsistency, uncertainty and redundancy.
Normalization is the process of efficiently organizing data in a database. There are two goals
of the normalization process: eliminating redundant data and ensuring data dependencies
make sense (only storing related data in a table).
Relational database design requires that we find a “good” collection of relation schemas. A
bad design may lead to Repetition of Information, Inability to represent certain information. A bad
or unformatted relational database design is called pitfalls.
Decomposition:
The process of taking a relation and splitting it up into multiple relations is called
decomposition. The purpose of decomposing relations is to avoid anomalies.
o Insertion Anomaly: Insertion Anomaly refers to when one cannot insert a new tuple into a
relationship due to lack of data.
o Deletion Anomaly: The delete anomaly refers to the situation where the deletion of data results
in the unintended loss of some other important data.
o Updatation Anomaly: The update anomaly is when an update of a single data value requires
multiple rows of data to be updated.
Normalization:
Normalization is defined as a step by step process of decomposing a complex relation into
simple and stable relations.
Advantages of Normalization:
Decreased storage space
Faster search performance
Improved data integrity
3
Reduces data redundancy in a database
Remove Insert, Delete, Update anomalies during database activates
Flexible structure Duplication of Data
Disadvantages of Normalization:
On Normalizing the relations to higher normal forms i.e. 4NF, 5NF the performance
degrades.
It is very time consuming and difficult process in normalizing relations of higher degree.
4
The Normal Form
A relation schema R is in 1 NF if all the attributes of the relation R are atomic in nature.
First normal form: Eliminates repeating groups by putting each into a separate table and
connecting them with a one-to-many relationship. Make a separate table for each set of related
attributes and uniquely identify each record with a primary key.
RULES:
Employee (unnormalized)
Solution: Make a separate table for each set of attributes with a primary key
5
Employee (1NF)
In Second Normal Form: Each attribute must be functionally dependent on the primary key.
Functional dependence - the property of one or more attributes that uniquely determines the value
of other attributes. Any non-dependent attributes are moved into a smaller (subset) table.
RULES:
The database must meet all the requirements of the first normal form.
Redundant data across multiple rows of a table must be moved to a separate table. And
create relationships between the tables through foreign key. Example:
Employee (1NF)
6
3 Jake Rivera 201 R&D DB2
3 Jake Rivera 201 R&D Oracle
3 Jake Rivera 201 R&D Java
Solution: Make a separate table like emp _ details (emp_no, name, dept_no, dept_name) and
skill_ details (emp_no, skills) and to set a primary key for emp_no attribute.
The resulting tables must be related to each other by use of foreign key.
emp_noskills
1 C
emp na dep odept_ 1 Perl
_no me t_n name 1 Java
2 Linux
1 Kevi 201 R&D 2 Mac
n 3 DB2
Jaco 3 Oracle
bs 3 Java
2 Barb 224 IT
ara
Jone
s
3 Jake 201 R&D
Rive
ra
7
Now that we have removed the duplicative data from the employee table, and it have to be moved
to a separate table. We’ve satisfied the first rule of second normal form. We still need to use a
foreign key to tie the two tables together. The result table looks like.
Employee (2NF)
Emp_no Name Dept_no Dept_name Skills
Benefits:
Disadvantage:
This solution has the disadvantage of introducing redundancy in skills relation. So we move on to
3NF.
RULES:
Example:
8
Employee (2NF)
9
Here #emp_no, #dept_no are primary key attribute.
For a table to satisfy the Boyce-Codd Normal Form, it should satisfy the following two
conditions:
The second point sounds a bit tricky, right? In simple words, it means, that for a dependency A
→ B, A cannot be a non-prime attribute, if B is a prime attribute.
Example
Below we have a college enrolment table with columns student_id, subject and professor.
As you can see, we have also added some sample data to the table. In
the table above:
• One student can enrol for multiple subjects. For example, student with student_id 101,
has opted for subjects - Java & C++
• For each subject, a professor is assigned to the student.
• And, there can be multiple professors teaching one subject like we have for Java.
This table satisfies the 1st Normal form because all the values are atomic, column names are
unique and all the values stored in a particular column are of same domain.
This table also satisfies the 2nd Normal Form as their is no Partial Dependency.
And, there is no Transitive Dependency, hence the table also satisfies the 3rd Normal Form.
But this table is not in Boyce-Codd Normal Form.
In the table above, student_id, subject form primary key, which means subject column is a prime
attribute.
But, there is one more dependency, professor → subject.
And while subject is a prime attribute, professor is a non-prime attribute, which is not allowed by
BCNF.
To make this relation(table) satisfy BCNF, we will decompose this table into two tables,
student table and professor table.
Below we have the structure for both the tables.
Student Table
11
And now, this relation satisfy Boyce-Codd Normal Form. In the next tutorial we will learn about the
Fourth Normal Form.
For a table to satisfy the Fourth Normal Form, it should satisfy the following two conditions:
A table is said to have multi-valued dependency, if the following conditions are true,
1. For a dependency A → B, if for a single value of A, multiple value of B exists, then the table may
have multi-valued dependency.
2. Also, a table should have at-least 3 columns for it to have a multi-valued dependency.
3. And, for a relation R(A,B,C), if there is a multi-valued dependency between, A and B, then B and C
should be independent of each other.
If all these conditions are true for any relation(table), it is said to have multi-valued dependency.
Example
Below we have a college enrolment table with columns s_id, course and hobby.
As you can see in the table above, student with s_id 1 has opted for two courses, Science and Maths,
and has two hobbies, Cricket and Hockey.
12
You must be thinking what problem this can lead to, right?
Well the two records for student with s_id 1, will give rise to two more records, as shown below,
because for one student, two hobbies exists, hence along with both the courses, these hobbies should
be specified.
And, in the table above, there is no relationship between the columns course and hobby. They are
independent of each other.
So there is multi-value dependency, which leads to un-necessary repetition of data and other
anomalies as well.
To make the above relation satify the 4th normal form, we can decompose the table into 2 tables.
CourseOpted Table
13
Now this relation satisfies the fourth normal form.
14
15
MULTIVALUED DEPENDENCY (BCA)
o Multivalued dependency occurs when two attributes in a table are independent of each other but, both
depend on a third attribute.
o A multivalued dependency consists of at least two attributes that are dependent on a third attribute that's
why it always requires at least three attributes.
Example: Suppose there is a bike manufacturer company which produces two colors(white and black) of
each model every year.
Here columns COLOR and MANUF_YEAR are dependent on BIKE_MODEL and independent of each
other.
16
In this case, these two columns can be called as multivalued dependent on BIKE_MODEL. The
representation of these dependencies is shown below:
1. BIKE_MODEL → → MANUF_YEAR
2. BIKE_MODEL → → COLOR
A join dependency (JD) can be said to exist if the join of R1 and R2 over C is equal to relation R.
Where, R1 and R2 are the decompositions R1(A, B, C), and R2 (C,D) of a given relations R (A, B,
C, D).
In other words, *(A, B, C, D), (C, D) will be a join dependency of R if the join of the join's
attributes is equal to relation R.
*( R1, R2, R3 ....) indicates that relations R1, R2, R3 and soon are a join dependency (JD) of R.
Example 1
This relation can be decomposed into three relations namely, HAS_SKILL, NEEDS_SKILL cind
ASSIGNED_TO_JOBS.
17
fig. the join dependencies of decomposed relations.
This relation stores the data about persons who have skills applicable to a particular job. But,
each person who has a skill required for a particular job need not be assigned to that job.
Thus, redundant tuples (rows) that show unnecessary SKILL-TYPE and JOB combinations are
removed by joining with relation NEEDS_SKILL.
o In a database, it breaks the table into multiple tables. o If the relation has no proper decomposition,
o Decomposition is used to eliminate some of the problems of bad design like anomalies, inconsistencies,
and redundancy.
18
Types of Decomposition
Lossless Decomposition
o If the information is not lost from the relation that is decomposed, then the decomposition will be lossless.
o The lossless decomposition guarantees that the join of relations will result in the same relation as it was
decomposed. o The relation is said to be lossless decomposition if natural joins of all the decomposition give the
original relation.
Example:
EMPLOYEE_DEPARTMENT table:
19
Now, when these two relations are joined on the common column "EMP_ID", then the resultant relation
will look like:
Employee Department
20
60 Jack 40 Noida 678 Testing
Dependency Preserving
o It is an important constraint of the database. o In the dependency preservation, at least one
o If a relation R is decomposed into relation R1 and R2, then the dependencies of R either must be a part
of R1 or R2 or must be derivable from the combination of functional dependencies of R1 and R2.
o For example, suppose there is a relation R (A, B, C, D) with functional dependency set (A->BC). The
relational R is decomposed into R1(ABC) and R2(AD) which is dependency preserving because FD A-
>BC is a part of relation R1(ABC).
21