[go: up one dir, main page]

0% found this document useful (0 votes)
7 views52 pages

212 Lecture 11 Chapter8-Normalization

This document discusses the principles of functional dependencies and normalization in relational databases, outlining guidelines for designing relation schemas to avoid issues like redundancy and update anomalies. It details the different normal forms (1NF, 2NF, 3NF, and BCNF) and their requirements, emphasizing the importance of ensuring that attributes are properly related and dependencies are managed. The document also provides examples of functional dependencies and the implications of various design choices on database integrity.

Uploaded by

s.hasan7510
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views52 pages

212 Lecture 11 Chapter8-Normalization

This document discusses the principles of functional dependencies and normalization in relational databases, outlining guidelines for designing relation schemas to avoid issues like redundancy and update anomalies. It details the different normal forms (1NF, 2NF, 3NF, and BCNF) and their requirements, emphasizing the importance of ensuring that attributes are properly related and dependencies are managed. The document also provides examples of functional dependencies and the implications of various design choices on database integrity.

Uploaded by

s.hasan7510
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 52

Basics of Functional

Dependencies and Normalization


for Relational Databases
Chapter # 08
Introduction
• Levels at which we can discuss goodness
of relation schemas
– Logical (or conceptual) level
– Implementation (or physical storage) level
• Approaches to database design:
– Bottom-up or top-down
Informal Design Guidelines
for Relation Schemas
• Measures of quality
– Making sure attribute semantics are clear
– Reducing redundant information in tuples
– Reducing NULL values in tuples
– Disallowing possibility of generating spurious
tuples
Semantics of the Relation
Attributes
• Design relation schema so that it is easy
to explain its meaning
• Do not combine attributes from multiple
entity types and relationship types into a
single relation
GUIDELINE 1
Informally, each tuple in a relation should represent one
entity or relationship instance. (Applies to individual
relations and their attributes).
-Attributes of different entities (EMPLOYEEs,
DEPARTMENTs, PROJECTs) should not be mixed in
the same relation
- Only foreign keys should be used to refer to other entities
- Entity and relationship attributes should be kept apart as
much as possible.
Redundant Information in Tuples
and Update
Anomalies
• Grouping attributes into relation schemas
– Significant effect on storage space
• Storing natural joins of base relations
leads to update anomalies
• Types of update anomalies:
– Insertion
– Deletion
– Modification
Anomalies Example
Consider the relation:
EMP_PROJ ( Emp#, Proj#, Ename, Pname,
No_hours)

• Update Anomaly: Changing the name of


project number P1 from “Billing” to
“Customer-Accounting” may cause this
update to be made for all 100 employees
working on project P1.
Anomalies Example
• Insert Anomaly: Cannot insert a project unless
an employee is assigned to .
Inversely - Cannot insert an employee unless
an he/she is assigned to a project.
• Delete Anomaly: When a project is deleted, it
will result in deleting all the employees who work
on that project. Alternately, if an employee is the
sole employee on a project, deleting that
employee would result in deleting the
corresponding project
Anomalies Example
GUIDELINE 2
• Design base relation schemas so that no
update anomalies are present in the
relations
• If any anomalies are present:
– Note them clearly
– Make sure that the programs that update the
database will operate correctly
NULL Values In tuple
• May group many attributes together into a
“fat” relation
– Can end up with many NULLs
• Problems with NULLs
– Wasted storage space
– Problems understanding meaning
GUIDELINE 3
Relations should be designed such that their
tuples will have as few NULL values as possible
• Attributes that are NULL frequently could be
placed in separate relations (with the primary
key)
• Reasons for nulls:
– attribute not applicable or invalid
– attribute value unknown (may exist)
– value known to exist, but unavailable
Spurious Tuples
• Bad designs for a relational database may
result in erroneous results for certain JOIN
operations
Spurious Tuples
Spurious Tuples
Spurious Tuples
Spurious Tuples
GUIDELINE 4
• Design relation schemas to be joined with
equality conditions on attributes that are
appropriately related
– Guarantees that no spurious tuples are
generated
• Avoid relations that contain matching
attributes that are not (foreign key, primary
key) combinations
Functional dependencies
• Functional dependencies (FDs) are used to
specify formal measures of the "goodness" of
relational designs
• FDs and keys are used to define normal forms
for relations
• FDs are constraints that are derived from the
meaning and interrelationships of the data
attributes
• A set of attributes X functionally determines a
set of attributes Y if the value of X determines a
unique value for Y
Functional dependencies
• A Functional Dependency describes a
relationship between attributes within a single
relation.

• An attribute is functionally dependent on another


if we can use the value of one attribute to
determine the value of another.
Functional dependencies
• X -> Y holds if whenever two tuples have the same value
for X, they must have the same value for Y

• For any two tuples t1 and t2 in any relation instance r(R):


If t1[X]=t2[X], then t1[Y]=t2[Y]

• X -> Y in R specifies a constraint on all relation


instances r(R)

• FDs are derived from the real-world constraints on the


attributes
Functional dependencies
Functional dependencies
Student_ID → Student_Major
Student_ID, CourseNumber,Semester → Grade
Course_Number, Section → Professor, Classroom, NumberOfStudents
Examples of FD constraints
• social security number determines employee
name
SSN -> ENAME
• project number determines project name and
location
PNUMBER -> {PNAME, PLOCATION}
• employee ssn and project number determines
the hours per week that the employee works on
the project
{SSN, PNUMBER} -> HOURS
Examples of FD constraints
• An FD is a property of the attributes in the
schema R

• The constraint must hold on every relation


instance r(R)

• If K is a key of R, then K functionally determines


all attributes in R (since we never have two
distinct tuples with t1[K]=t2[K])
Examples of FD constraints
Normalization
• Normalization: The process of decomposing
unsatisfactory "bad" relations by breaking up
their attributes into smaller relations

• Normal form: Condition using keys and FDs of


a relation to certify whether a relation schema is
in a particular normal form
Definitions of Keys and Attributes
Participating in Keys
• A superkey of a relation schema R = {A1, A2,
...., An} is a set of attributes S subset-of R with
the property that no two tuples t1 and t2 in any
legal relation state r of R will have t1[S] = t2[S]

• A key K is a superkey with the additional


property that removal of any attribute from K will
cause K not to be a superkey any more.
Definitions of Keys and Attributes
Participating in Keys
• If a relation schema has more than one key, each is
called a candidate key. One of the candidate keys is
arbitrarily designated to be the primary key, and the
others are called secondary keys.

• A Prime attribute must be a member of some candidate


key

• A Nonprime attribute is not a prime attribute—that is, it


is not a member of any candidate key.
First Normal Form
Disallows composite attributes, multivalued attributes,
and nested relations; attributes whose values for an
individual tuple are non-atomic

Techniques to achieve first normal form


– Remove attribute and place in separate relation
– Expand the key
– Use several atomic attributes
First Normal Form
First Normal Form
1. Remove the attribute Dlocations that violates 1NF and place it in a
separate relation DEPT_LOCATIONS along with the primary key
Dnumber of DEPARTMENT

2. Expand the key so that there will be a separate tuple in the original
DEPARTMENT relation for each location of a DEPARTMENT
First Normal Form
3. If a maximum number of values is known for the attribute—for
example, if it is known that at most three locations can exist for a
department—replace the Dlocations attribute by three atomic
attributes:Dlocation1, Dlocation2, and Dlocation3.
Second Normal Form
Uses the concepts of FDs, primary key

Definitions:
Prime attribute - attribute that is member of the
primary key K
Full functional dependency - a FD Y -> Z where
removal of any attribute from Y means the FD
does not hold any more
Second Normal Form
- {SSN, PNUMBER} -> HOURS is a full
FD since neither SSN -> HOURS nor
PNUMBER -> HOURS hold

- {SSN, PNUMBER} -> ENAME is not a


full FD (it is called a partial dependency )
since SSN -> ENAME also holds
Second Normal Form
• A relation schema R is in second normal
form (2NF) if every non-prime attribute A
in R is fully functionally dependent on the
primary key
Second Normal Form
Third Normal Form
A relation schema R is in third normal
form (3NF) if it is in 2NF and no non-
prime attribute A in R is transitively
dependent on the primary key
Third Normal Form
• Third normal form (3NF)is based on the concept
of transitive dependency

• A functional dependency X→Y in a relation


schema R is a transitive dependency if there
exists a set of attributes Z in R that is neither a
candidate key nor a subset or any key of R and
both X→Z and Z→Y hold
Third Normal Form
Examples:
SSN -> DMGRSSN is a transitive FD
Since SSN -> DNUMBER
DNUMBER -> DMGRSSN
SSN -> ENAME is non-transitive
Since there is no set of attributes X where
SSN -> X and X -> ENAME
Third Normal Form
Third Normal Form
In X -> Y and Y -> Z, with X as the primary
key, we consider this a problem only if Y is
not a candidate key. When Y is a
candidate key, there is no problem with
the transitive dependency .
E.g., Consider EMP (SSN, Emp#, Salary ).
Here, SSN -> Emp# -> Salary and Emp#
is a candidate key
Summary
• 1st normal form
All attributes depend on the key,multi value not
allowed
• 2nd normal form
All attributes depend on the whole key
• 3rd normal form
All attributes depend on nothing but the key
Summary of Normal Forms Based on Primary
Keys and Corresponding Normalization
Boyce-Codd Normal Form
• A relation schema R is in Boyce-Codd Normal Form
(BCNF) if whenever an FD X -> A holds in R, then X is a
superkey of R
• Each normal form is strictly stronger than the previous
one
Every 2NF relation is in 1NF
Every 3NF relation is in 2NF
Every BCNF relation is in 3NF
• There exist relations that are in 3NF but not in BCNF
• The goal is to have each relation in BCNF (or 3NF)
Boyce-Codd Normal Form
• Every determinant in table is a candidate
key
– Has same characteristics as primary key, but
for some reason, not chosen to be primary
key
• When table contains only one candidate
key, the 3NF and the BCNF are equivalent
• BCNF can be violated only when table
contains more than one candidate key
Boyce-Codd Normal Form

A table that is in 3NF but not in BCNF


Boyce-Codd Normal Form
Boyce-Codd Normal Form
The difference between 3NF and BCNF is that
for a functional dependency A  B,

3NF allows this dependency in a relation if B is a


primary-key attribute and A is not a candidate
key,

whereas BCNF insists that for this dependency


to remain in a relation, A must be a candidate
key.
Boyce-Codd Normal Form
An example table from the University Database might be as follows:
If we know the Student Number and Teacher Code we know the
Offering (class) the student is in. We also know the review date for
that student and teacher (Student progress is reviewed for that class
by the teacher and student).
Boyce-Codd Normal Form
The dependencies are
S_Num, T_Code →Offering#, Review Date

which means that the table is in third normal form.


The table is not in BCNF as if we know the offering
number we know who the teacher is. Each offering can
only have one teacher!
Offering# → T_Code

A non key attribute is a determinant.


Boyce-Codd Normal Form

You might also like