[go: up one dir, main page]

0% found this document useful (0 votes)
24 views37 pages

05 - Relational Database Design - Week 05

The document outlines guidelines for designing relational database management systems (RDBMS), emphasizing the importance of avoiding anomalies such as insertion, deletion, and update anomalies to maintain data integrity. It discusses functional dependencies and normalization, detailing the various normal forms (1NF, 2NF, 3NF, BCNF, 4NF, 5NF) that help structure databases efficiently. Additionally, it defines keys, superkeys, and candidate keys, which are essential for uniquely identifying rows in tables.

Uploaded by

vamikabhardwaj9
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as KEY, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views37 pages

05 - Relational Database Design - Week 05

The document outlines guidelines for designing relational database management systems (RDBMS), emphasizing the importance of avoiding anomalies such as insertion, deletion, and update anomalies to maintain data integrity. It discusses functional dependencies and normalization, detailing the various normal forms (1NF, 2NF, 3NF, BCNF, 4NF, 5NF) that help structure databases efficiently. Additionally, it defines keys, superkeys, and candidate keys, which are essential for uniquely identifying rows in tables.

Uploaded by

vamikabhardwaj9
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as KEY, PDF, TXT or read online on Scribd
You are on page 1/ 37

Unit # 3 - Relational DBMS Design Guidelines,

- Functional Dependencies - Normalisation for RDBMS, - Normal


Forms ( 1NF,2NF,3NF,BCNF,4NF,5NF) (Week-5)
RDBMS Design
Guidelines
In the design of RDBMS there are several important practices that should be kept in
mind to create databases that are a stable ,efficient, scalable, and maintainable.
There are four informal guidelines that may be used as measures to determine
the quality of relation schema design:

Making sure that the semantics of the attributes is clear in the schema
Reducing the redundant information in tuples
Reducing the NULL values in tuples
Disallowing the possibility of generating spurious tuples Chapter-15
GUIDELINE
-1
This guideline can be stated more formally as below :
“Informally, each tuple in a relation should represent one entity or
relationship instance.
This essentially means , Design a
relation schema so that it is easy to
explain its meaning. Do not combine
attributes from multiple entity types and
relationship types into a single relation.
If the schema is complex it shall suffer
from three anomalies.
RDBMS Design Guidelines
(contd)
1. Insertion anomaly
2. Deletion anomaly
3. Update anomaly

1. Insertion Anomaly :
Insertion anomaly occurs in relational databases when the structure of the table or the
design of the schema makes it difficult to insert new data without introducing
redundancy, inconsistency, or irrelevant data. This typically happens when the table is
not properly normalized, causing dependencies between unrelated data.
Consider In the COMPANY database, we have a single table called
EMPLOYEE_PROJECT that stores both employee information and project
information together.

This table combines data about


employees and the projects
they work on which were above
correctly separated in two
different tables.

Now, let’s assume we want to add a new project, called Project Gamma, located at
the Mechanical Engineering Department but there are yet no employees assigned to
this project. In the current structure, we cannot we cannot do this task since there is
no employee information. This dependency is insertion anomaly.
RDBMS Design Guidelines
(contd)
2. Deletion Anomaly :
A deletion anomaly shall occurs in a relational database when the deletion of data
inadvertently results in the loss of other valuable information i.e deleting a record
related to one entity can unintentionally delete information related to another entity.
In the EMPLOYEE_PROJECT table now let’s assume Rajesh is no longer working on
Project Alpha, and we decide to delete his record for Project Alpha from the
EMPLOYEE_PROJECT table.
Rajesh's information related to Project Alpha was deleted, but along with that, his date
of birth (Bdate), address, and department number (Dnumber) related to that project
were also removed.
In fact , If Rajesh was only working on Project Alpha and not on any other projects,
deleting this record would also remove all information about Rajesh, including his
address, birthdate, and employment details, even though that information might be
useful for other purposes (e.g., HR records).
This is Deletion anomaly.
3. Update Anomaly :
An update anomaly occurs when updating a piece of information in one place requires
updating the same data in multiple places. For ex if Rajesh moves from his old 12 MG
Road address to say a new address 45,Ashok Vihar this update needs to be done in
all places . If not all instances are updated correctly, the database can become
inconsistent.
RDBMS Design Guidelines
GUIDELINE (contd)
-2
This guideline stresses the importance of eliminating data anomalies—insertion,
deletion, and update anomalies—from the database schema to ensure data integrity
and efficiency. In cases where these anomalies cannot be avoided, the design should
clearly document the potential issues, and database applications or update programs
should handle these anomalies carefully to avoid inconsistency. More formally ,

“Design a schema that does not suffer from the insertion, deletion and
update anomalies. If there are any anomalies present, then note them so
that applications can be made to take them into account.”

GUIDELINE
-3
This guideline emphasises minimising the use of null values within relation schemas.
Null values can eat up space and can introduce ambiguity because they may signify
different situations such as ,the attribute does not apply to the entity. The attribute's
value is unknown. The attribute's value is known but hasn't been recorded yet.

This guideline can be stated more formally as below :

“Relations should be designed such that their tuples will have as few NULL
values as possible . Attributes that are NULL frequently could be placed in
separate relations (with the primary key).”
RDBMS Design Guidelines
(contd)
For example, if only 15 percent of employees have individual offices, there is little
justification for including an attribute Office_number in the EMPLOYEE relation;
rather, a relation EMP_OFFICES(Essn, Office_number) can be created to include
tuples for only the employees with individual offices.

GUIDELINE
-4
This guideline addresses the prevention of spurious tuples during joins. A spurious
tuple is an unintended and incorrect result created when joining tables on attributes
that do not ensure a meaningful or valid join. When relation schemas are not
designed with primary key-foreign key constraints for joining, joining these tables may
produce tuples that do not represent actual or meaningful data. More formally ,

“Design relation schemas so that relations do not contain matching


attribute other than foreign key - primary key combination so that no
spurious tuples (additional tuples) are generated in join operation.”
RDBMS Design Guidelines
(contd)

If we join these tables on Plocation (a non-key attribute in both tables), the result
might include invalid or duplicate records (spurious tuples), suggesting erroneous
relationships between employees and projects. By following Guideline 4, the
design would instead ensure joins are based on SSN and Pnumber—attributes
that act as a primary key and foreign key—maintaining the integrity of the
information represented​.
Form more details , Please refer the pdf. Explanation for Figure 15.5/page# 511
Functional Dependeny
Functional Dependency concept is a formal tool for analysis of relational schemas
that enables us to detect and describe RDBMS Design Guidelines in precise terms i.e
it is

used to specify formal measures of the "goodness" of relational designs


Definition :
A functional dependency is defined as a constraint between two sets of attributes say
X and Y of relation R such that for any two tuples t1 and t2 in R that have t1[X]=t2[X],
we must also have t1[Y]=t2[Y].
What this means basically is that value of Y component of tuple in r depend on or
determined by value of X component. We can also say that there is functional
dependency from X to Y or that Y is functionally dependent on X. It is shown as
XY

In our example of relation EMPLOYEE we can see Dept_ID → Dept_Name


because it satisfies the condition t1[X]=t2[X], we have t1[Y]=t2[Y].
Functional Dependeny
Another example: (contd)

The notation t1[X]=t2[X] ⇒ 𝑡1[𝑌]=𝑡2[𝑌] describes the core concept of functional


dependency. It ensures that when attributes are dependent on others, any two rows
with the same values for the determinant attribute(s) must have the same values for the
dependent attribute(s). This property is crucial in database design for maintaining data
integrity and consistency.
Some other examples from COMPANY database:

ssn -> eName (Social security number determines employee name)


Pnumber -> {Pname, Plocation } (Project number determines project name and
location)
{ssn, Pnumber} -> hour (Employee ssn and project number determines the hours per
week that the employee works on the project )
Functional Dependeny
(contd)
Functional dependency can be either Full or Partial.
Full Functional dependency
A functional dependency X → Y is a full functional dependency if removal of
any
attribute A from X means that the dependency does not hold any more.
Eg:
In Figure below , {Ssn, Pnumber} → Hours is a full dependency (neither Ssn →
Hours nor Pnumber→Hours holds).
Partial Functional dependency
A functional dependency X→Y is a partial dependency if some attribute A ε X
can be removed from X and the dependency still holds.
Eg:
The dependency {Ssn, Pnumber}→Ename is partial because Ssn→Ename holds.
Functional Dependency
(contd)
Armstrong's Inference
Rules :
Armstrong's Inference Rules are a set of axioms or rules to derive all possible
functional dependencies from a given set of functional dependencies. These rules
help determine the closure of a set of functional dependencies, which is essential in
tasks like normalization and ensuring the consistency of database design. The
Armstrong’s three primary Inference rules (IRs) are :
1. Reflexive : If Y subset-of X, then X -> Y
2. Augmentation : If X -> Y, then XZ -> YZ (Notation: XZ stands for X U Z)

3. Transitive : If X -> Y and Y -> Z, then X -> Z

1. IR1 -
Reflexive
Any attribute set (or combination) can determine its own subset. This is trivial and
holds by definition i.e If Y is a subset of X, then X → Y.
Eg. If X = {Emp_ID, Emp_Name} and Y = {Emp_ID}, then
{Emp_ID, Emp_Name} → {Emp_ID}.

2. IR2 -
IfAugmentation
an attribute set X can determine Y, then adding more attributes (denoted by Z) to both
X and Y does not change the dependency.
Functional Dependency
Essentially, you can (contd)
"augment" both sides of a dependency with the same set of
attributes without changing the meaning of the dependency.
Eg:
If Emp_ID → Emp_Name, i.e Emp_ID uniquely determines Emp_Name (each
employee has a unique identifier that determines their name).
Then , According to the augmentation rule, if Emp_ID → Emp_Name, we can add
any additional attribute set (say, Dept_ID) to both sides of this dependency, and the
dependency will still hold. So, using augmentation:
{Emp_ID, Dept_ID} → {Emp_Name, Dept_ID}.
Augmentation is often used in conjunction with other rules (like transitivity) to derive
additional dependencies that help with database normalization.

3. IR3 -
IfTransitivity
X determines Y and Y determines Z, then X indirectly determines Z. This is similar to
the transitive property in mathematics.
Eg:
If Emp_ID → Dept_ID and Dept_ID → Dept_Name, then Emp_ID → Dept_Name.

IR1, IR2, IR3 form a sound and complete set of inference rules. All
other rules that hold can be deduced from these rules.
Functional Dependency
(contd)
Some additional inference rules that are derived using IR1 , IR2 & IR3 are
Decomposition: If X -> YZ, then X -> Y and X -> Z
If X determines both Y and Z together, then X can determine each one separately.
Example: If Emp_ID → {Emp_Name, Dept_ID}, then Emp_ID → Emp_Name and
Emp_ID → Dept_ID.
Union: If X -> Y and X -> Z, then X -> YZ
Conversely , If X determines Y and X determines Z, then X determines the union of Y
and Z.
Example: If Emp_ID → Emp_Name and Emp_ID → Dept_ID, then Emp_ID →
{Emp_Name, Dept_ID}.
Psuedotransitivity: If X -> Y and WY -> Z, then WX -> Z
If X determines Y and a combination of W and Y determines Z, then the combination
of W and X can determine Z.
Example: If Emp_ID → Dept_ID and {Role, Dept_ID} → Salary, then {Role, Emp_ID}
→ Salary.
Normalisation for RDBMS
Normalization is the process of organizing the data in the database. More
specifically , It is used to minimize the redundancy from a relation or set of relations.
It divides the larger table into smaller and links them using relationships.
The main goal for normalizing the relations is removing INSERTION, DELETION and
UPDATE anomalies. Failure to eliminate anomalies leads to data redundancy and
can cause data integrity and other problems as the database grows. Normalization
consists of a series of guidelines that helps to guide in creating a good database
structure.
Normalization works through a series of stages called Normal forms. The normal
forms apply to individual relations. The relation is said to be in particular normal form
if it satisfies constraints.

Key Goals of Normalization


Eliminate Redundancy: Avoid storing the same data in multiple places, which
reduces storage costs and keeps data consistent.
Prevent Anomalies: Ensure that insertions, deletions, and updates can occur
without causing inconsistency.
Improve Data Integrity: Enforce consistency through dependencies and
relationships between tables.
Optimize Query Performance: A well-structured database is easier to query and
maintain over time.
A quick Wrap up on Keys
Key
1.

A Key is an attribute (or a combination of attributes) used to uniquely identify a row
in a table.
2. Super Key
A Superkey is any combination of attributes that can uniquely identify a row in a
table
Superkeys include not only the primary key but also any additional attributes that
can still uniquely identify each row.
Example: In a table with attributes Emp_ID, Emp_Name, and Dept_ID,
{Emp_ID} is a superkey, and so are {Emp_ID, Emp_Name} and {Emp_ID,
Dept_ID}.
3. Candidate Key
A Candidate Key is a minimal superkey—meaning it’s a superkey with no
unnecessary attributes.
Candidate keys are potential choices for the primary key, as each candidate key can
uniquely identify rows in a table.
A table can have multiple candidate keys, and one of them is chosen the primary key
Example: In a table with Emp_ID and SSN, both could be candidate keys if each
is unique for each row.

Every Primary Key is a Candidate Key, but not all candidate keys are chosen as the
primary key.
Every Candidate Key is a Superkey, but not all superkeys are candidate keys (since
superkeys can have unnecessary attributes).
Normal
Forms
The normalization procedure provides database designers with series of tests to
certify whether a relation satisfies a certain normal form. It is a process which
proceeds in a top-down fashion by evaluating each relation against the criteria for
normal forms and decomposing relations as necessary.
Simply put, it can be considered as a “filtering” or “purification” process to
make the design have successively better quality.
Unsatisfactory relation schemas that do not meet certain conditions—the normal
form tests—are decomposed into smaller relation schemas that meet the tests
and hence possess the desirable properties.
The normal form of a relation refers to the highest normal form condition that it
meets, and hence indicates the degree to which it has been normalized.
Codd first proposed in 1972 ,three normal forms 1NF , 2NF , 3NF(BCNF). Later, a
fourth normal form (4NF) and a fifth normal form (5NF) were proposed, based on
the concepts of multivalued dependencies and join dependencies
The database designers need not normalize a relation to the highest possible
normal form. Relations may be left in a lower normalization status, such as 2NF, if
performance is achieved.

1. First Normal
Form
A relation is said to be in 1NF if it contains only atomic (indivisible) values, and each
column contains only one value per row. Thus , First normal form (1NF) is
actually considered to be part of the formal definition of a relation
Normal Forms (contd)
Historically , it was defined to disallow multivalued attributes, composite attributes,
and their combinations. It states that the domain of an attribute must include only
atomic (simple, indivisible) values and that the value of any attribute in a tuple must
be a single value from the domain of that attribute. Hence, 1NF disallows having a
set of values, a tuple of values, or a combination of both as an attribute value for a
single tuple.

Rules:
1. All values in each column must be atomic, meaning no repeating groups or arrays.
2. Each column must contain only one type of data.
3. Each row must be unique.
Normal Forms (contd)
2. Second Normal
Form normal form (2NF) is based on the concept of full functional dependency.
Second
A relation is said to be in 2NF, if it is in 1NF and every non-key attribute is fully
functionally dependent on the primary key.
A functional dependency X → Y is a full functional dependency if removal of
any attribute A from X means that the dependency does not hold any more. Refer slide
#10.
Rules
1. The table must be in 1NF.
2. Every non-key attributes is fully depend on the Primary Key i.e Partial dependency
should NOT exist.
In the below relation , FD2 & FD3 are partial hence this relation is not 2NF. The
functional dependencies FD2 and FD3 make Ename, Pname, and Plocation partially
dependent on the primary key {Ssn, Pnumber} of EMP_PROJ, thus violating the 2NF
test.
Normal Forms
(contd)
To make this relation 2NF we can split the table into three tables, EP1 with
(Ssn,Pnumber, Hours ) , EP2 with (Ssn,Ename) and EP3 with (Pnumber ,
Pname ,Plocation) to eliminate partial dependencies.

3. Third Normal
Form
Third normal form (3NF) is based on the concept of transitive dependency.
A relation is in 3NF if it is in 2NF and there are NO transitive dependencies.
A functional dependency X→Z in a relation schema R is a transitive
dependency if
both X→Y and Y→Z hold . The only restriction is that the attributes Z in R is neither a
candidate key nor a subset of any key of R. Refer slide#12
Rules:
1. The table must be in 2NF.
2. No transitive dependency should exist, which means that non-key attributes should
not depend on other non-key attributes.
Normal Forms
(contd)
In the EMP_DEPT relation below , the dependency Ssn→Dmgr_ssn is transitive
through Dnumber because both the dependencies Ssn → Dnumber and Dnumber →
Dmgr_ssn hold and Dnumber is neither a key itself nor a subset of the key of
EMP_DEPT.
Hence , EMP_DEPT is not in 3NF because of the transitive dependency of Dmgr_ssn
(and also Dname) on Ssn via Dnumber.
We can normalize EMP_DEPT by decomposing it into the two 3NF relation schemas
ED1 and ED2 as shown in fig below.
shown in Figure
Normal Forms
1NF, 2NF ,3NF(contd)
Recap :-
In general, we want to design relation schemas so that they have neither partial
nor transitive dependencies because these types of dependencies cause the update
Anomalies.
Below table summarise the 1NF,2NF &3NF along with test and procedure of how to
reduce them to required normal form.
Normal Forms
(contd)
4. Boyce-Codd Normal
Formis a stricter form of 3NF. A relation is in BCNF if it is in 3NF and, for every
BCNF
functional dependency X → Y, X is a superkey.
A relation might be in 3NF but not necessarily in BCNF: This occurs
when there are dependencies involving non-superkey attributes, which BCNF does not
allow. This is explained below:
Consider the COMPANY database with attributes
{Project_ID, Project_Manager , Emp_ID , Emp_Name }
Lets define some functional dependencies as below :
Project_ID → Project_Manager (Each project has a single manager.)
Emp_ID → Emp_Name (Each employee has a unique name.)
Project_ID, Emp_ID → Emp_Name (An employee can work on multiple projects,
so Project_ID and Emp_ID together determine Emp_Name.)
3NF Check
To determine if this relation is in 3NF we have rules as per slide# 19
o

oi.e Relation should be in 2NF . This means there is No partial dependencies


o No transitive dependencies for non-key attributes.
The table meets the requirements for 3NF because:
o

o Every non-key attribute depends directly on a candidate key.


o There are no transitive dependencies from non-prime attributes.
Normal Forms
BCNF Check (contd)
To determine if this relation is in BCNF every determinant must be a superkey. Let’s
see if that’s the case here.
Project_ID → Project_Manager:
In this dependency, Project_ID is not a superkey for the entire table (because it does
not uniquely determine Emp_ID and Emp_Name). Thus, this dependency violates
BCNF.
Emp_ID → Emp_Name:
This dependency doesn’t violate BCNF since Emp_ID uniquely determines
Emp_Name.
Project_ID, Emp_ID → Emp_Name:
This composite key dependency does not violate BCNF since Project_ID and Emp_ID
together form a superkey.
Thus , Since Project_ID → Project_Manager does not meet BCNF requirements
(because Project_ID is not a superkey), this table is in 3NF but not in BCNF.
To bring this relation into BCNF, we need to decompose it into separate tables based
on dependencies.
PROJECT {Project_ID , Project_Manager }
EMPLOYEE_PROJECT {Project_ID , Emp_ID , Emp_Name }
Normal Forms
(contd)
After decomposition, each table now satisfies BCNF:
In the PROJECT table,
Project_ID is a superkey for determining Project_Manager.
In the EMPLOYEE_PROJECT table,
each functional dependency either has Emp_ID as a key (for Emp_Name) or
Project_ID, Emp_ID together as a composite key.

This example illustrates that:


The original table was in 3NF but not in BCNF due to a dependency where the
determinant (Project_ID) was not a superkey.
Decomposition was required to bring the table into BCNF.
3NF is often sufficient for practical applications, but BCNF eliminates further
anomalies by enforcing stricter rules, especially when there are dependencies
involving non-superkey attributes.

Note:
You can answer following two questions based on above understanding
1. What is BCNF
2. Show with example a relation that is in 3NF but not in BCNF. How will you convert
the relation to BCNF .
Normal Forms
4. Fourth Normal
Form
(contd)
4NF addresses a specific type of dependency called a multi-valued dependency
(MVD). A relation is in 4NF if:
o

o It is already in Boyce-Codd Normal Form (BCNF).


It has
1NF ,o 2NF no multi-valued
, 3NF dependencies
and BCNF depends on the other than
concept ofafunctional
trivial one.
dependency, which
is by far the most important type of dependency in relational database design theory.
However, in many cases relations have constraints that cannot be specified as
functional dependencies. A Multivalued Dependency (MVD) occurs when, in a
relation, two attributes are independent of each other but both depend on a third
attribute. 4NF (fourth normal form) , is based on this type of dependency.

Consider a relation 𝑅 containing information about


a student's hobbies and the courses they are
taking:
Here ,
Priya has multiple hobbies:
Singing and Painting.
Priya is enrolled in multiple courses:
Mathematics and Physics.
For this relation:
Hobby is independent of Course for a given student. But both dependent on
Student
Student→→Hobby , Student→→Course
Normal Forms
(contd)
To keep the relation state consistent, and to avoid any spurious relationship between
the two independent attributes, we split the EMP table to two relations as below.

5. Fifth Normal
AForm
relation R is in 5NF if and only if every join dependency in R is implied by the
candidate keys of R.
What this actually means is a relation decomposed into two relations must have loss-
less join Property, which ensures that no spurious or extra tuples are
generated, when relations are reunited through a natural join based on candidate
Key.
Rules
1. It is in 4NF.
2. It cannot be decomposed into smaller relations without losing information, meaning
Normal Forms
(contd)
In many relations , A join dependency exists when a relation can be decomposed
into multiple smaller relations that can be joined back to get the original relation
without any loss of information.
In simpler terms, 5NF ensures that complex relationships among data are
structured in a way that does not introduce redundancy if they are joined using
candidate keys.
Lossless Join and Dependency
Preserving Decomposition
When designing a relational database, it is often necessary to decompose large tables
into smaller ones to remove redundancy and anomalies (insertion, deletion, and
update). However, decomposition should meet two important criteria to ensure data
integrity and usability:
Lossless Join:
Ensures no data is lost when decomposed tables are joined back together.
Dependency Preservation:
Ensures all functional dependencies in the original table are maintained in the
decomposed tables.

Lossless Join
A decomposition is said to be lossless if, after decomposing a relation 𝑅 into smaller
relations 𝑅1 and 𝑅2, the original relation 𝑅 can be reconstructed by performing a
Decomposition

natural join on 𝑅1​ and 𝑅2 without losing any data or introducing spurious tuples.
More formally ,
A decomposition of a relation R into R1​and R2​is lossless if: R=R1⋈R2. This
means the result of the natural join of R1​and R2​contains exactly the tuples in the
original relation R. The decomposition is lossless if at least one of the following holds:

The intersection of 𝑅1​ and 𝑅2​ contains a candidate key of either 𝑅1​or 𝑅2​.
Key condition for Lossless Join are :

The intersection of 𝑅1 ​and 𝑅2 can functionally determine all attributes in one of


1.

2.

the decomposed tables.


Lossless Join Decomposition (contd)
Consider the COMPANY database with attributes from slide#22
{Project_ID, Project_Manager , Emp_ID , Emp_Name }
We decompose this relation into two smaller relations:
PROJECT {Project_ID , Project_Manager }
EMPLOYEE_PROJECT {Project_ID , Emp_ID , Emp_Name }
To check if this decomposition is lossless, we can perform a natural join on
PROJECT and EMPLOYEE_PROJECT
PROJECT ⋈ EMPLOYEE_PROJECT
The result matches the original table. Hence, this decomposition is lossless.

Dependency Preserving
ADecomposition
from the original relation 𝑅 are maintained in the decomposed relations 𝑅1,𝑅2,
decomposition is said to be dependency preserving if all functional dependencies

…,𝑅𝑛.

A decomposition of a relation 𝑅 into 𝑅1,…,𝑅𝑛 is dependency preserving if the union


Formally ,

of functional dependencies preserved in 𝑅1,…,𝑅𝑛​ is equivalent to the functional


dependencies in 𝑅.
Dependency Preserving
𝐹 ′ = 𝐹′ Decomposition
𝐹: The original set of functional dependencies in 𝑅.
Where:

𝐹′ : The union of functional dependencies preserved in 𝑅1,…,𝑅𝑛 .


Consider the previous example from COMPANY database.
{Project_ID, Project_Manager , Emp_ID , Emp_Name }
In this relation we can see following dependencies.
Project_ID → Project_Manager (Each project has a unique manager.)
Emp_ID → Emp_Name (Each employee has a unique name.)
Project_ID, Emp_ID → Emp_Name (For a given project, an employee’s ID
determines their name.)

We decompose this table into two smaller tables:


Dependency Preserving Decomposition
(contd)
We can now check Dependency Preservation
1. Project_ID → Project_Manager:
In the PROJECT table, Project_Manager can be uniquely identified by
Project_ID.
Dependency Project_ID → Project_Manager is preserved.
2. Emp_ID → Emp_Name:
In the EMPLOYEE_ASSIGNMENT table, Emp_ID uniquely determines
Emp_Name.
Dependency Emp_ID → Emp_Name is preserved.
3. Project_ID, Emp_ID → Emp_Name:
In the EMPLOYEE_ASSIGNMENT table, the combination of Project_ID and
Emp_ID uniquely determines Emp_Name.
Dependency Project_ID, Emp_ID → Emp_Name is preserved.

The decomposition into PROJECT and EMPLOYEE_ASSIGNMENT preserves all


the functional dependencies from the original relation. Hence, this decomposition is
dependency preserving.
Multivalued Dependency in
A Multivalued RDBMS
Dependency occurs when two distinct attributes in a table are
independent of each other but depend on a third attribute.
Thus a multivalued dependency always requires at least three attributes.
Formally ,
Given a relation R, a multi-valued dependency X→→Y exists if, for each value of X, a
set of values for Y is associated independently of the values of any other attributes
(denoted as Z) in the relation.
Eg.

Consider a relation 𝑅 containing


information about a student's hobbies and
the courses they are taking:
Here ,
Priya has multiple hobbies:
Singing and Painting.
Priya is enrolled in multiple courses:
Mathematics and Physics.

For this relation:


Hobby is independent of Course for a given student. But both dependent on
Student
Student→→Hobby , Student→→Course
This leads to data redundancy, as for each hobby, all course combinations are
repeated. Thus, a multivalued dependency would prevent the 4NF.
Multivalued Dependency in DBMS
4NF, decompose 𝑅 into two relations as was done on slide #:
(contd)
To achieve
26

This decomposition removes redundancy while preserving the multi-valued


dependencies.
Join Dependency in
Join decomposition isDBMS
a further generalization of Multivalued dependencies. A relation
R is said to have a join dependency if and only if it can be decomposed into
multiple relations R1, R2, ..., Rn such that the join of these relations results in the
original relation R, without any data redundancy or information loss.
Formally,
If the join of R1 and R2 over C is equal to relation R, then we can say that a join
dependency (JD) exists. Where R1 and R2 are the decompositions R1(A, B, C) and
R2(C, D) of a given relations R (A, B, C, D).
Join Dependency
Ensures that a relation is decomposed in a way that eliminates redundancy.
Prevents anomalies like update, delete, or insert anomalies.
Ensures that data is stored in its minimal form.

Consider a relation
R(Student, Subject, Professor):
Problem:
The same Subject-Professor pair is
repeated for multiple students.
This redundancy can lead to update
anomalies.
For example, if a professor for a subject
changes, multiple rows will need to be updated.
Join Dependency in DBMS
(contd)
We can decompose this relation R(Student, Subject, Professor) into smaller relations:

The original relation R(Student, Subject, Professor) can be reconstructed by

𝑅= R1⋈R2
performing a natural join on R1(Student, Subject) and R2(Subject, Professor)

assignment do not require updating multiple rows and it can be seen that 𝑅= R1⋈R2
Thus this decomposition ensures that any changes in the Subject-Professor

Shall recover the original relation R. Hence Join Dependency ensures lossless
decomposition.
Thanks

You might also like