[go: up one dir, main page]

0% found this document useful (0 votes)
3 views53 pages

Chapter 4

Chapter 4 discusses database design theory and normalization, emphasizing the importance of creating well-structured relations to minimize redundancy and avoid anomalies such as insertion, deletion, and updation issues. It outlines the normalization process, detailing the different normal forms (1NF, 2NF, 3NF, BCNF, and 4NF) and their criteria for ensuring data integrity and efficiency. The chapter also covers functional dependencies and the significance of eliminating transitive dependencies to maintain a robust database structure.

Uploaded by

abcde1marc
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views53 pages

Chapter 4

Chapter 4 discusses database design theory and normalization, emphasizing the importance of creating well-structured relations to minimize redundancy and avoid anomalies such as insertion, deletion, and updation issues. It outlines the normalization process, detailing the different normal forms (1NF, 2NF, 3NF, BCNF, and 4NF) and their criteria for ensuring data integrity and efficiency. The chapter also covers functional dependencies and the significance of eliminating transitive dependencies to maintain a robust database structure.

Uploaded by

abcde1marc
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 53

Chapter-4

DATABASE DESIGN THEORY AND


NORMALIZATION
Outline
Well structure relation
In logical database design, a process called normalization is
used to produce a data model that has the properties of simplicity,
non-redundancy and minimal maintenance.

When designing a database, the analyst should aim to design


well-structured relations/tables.
Cont’d...
A well-structured relation contains a minimum amount of
redundancy and allows users to insert, update and delete the
rows in a table without errors or inconsistencies.

It will be designed by using a technique called normalization.

Normalization is the process of decomposing relations with


anomalies to produce smaller, well-structured relations.
Basic concepts of normalization
A technique of organizing a data into multiple related tables to
minimize data redundancy.

 Through normalization we want to design for our relational


database a set of files that
 Have as little redundancy as possible,

 Permit efficient updates of the data in the database, and

 Avoid the danger of losing data unknowingly.


Data redundancy

What is data redundancy? And why should we reduce it?

Table
Row-1 X
X
Row-2
X
Row-3 X
Row-4
In a relation if a fact is stored more than once ,there is data
redundancy.
Cont’d…
Data redundancy also leads to other issues like:
Insertion , Deletion and Updation problems

Student_table
ID Stu-name Dept Hod Phone-no
Ababa CS Mr. x 0918502888
AAU01
Yadiel CS Mr. x 0918502888
AAU02
John CS Mr. x 0918502888
AAU03
Jonatan CS Mr. x 0918502888
AAU04
Cont’d…
Repetition of data increases the size of databases and leads to
other more issues such as
Insertion anomalies

Deletion anomalies and

Updation anomalies

These types of error can creep into databases that are


insufficiently normalized.
Insertion anomalies

An "Insertion Anomaly" is a failure to place information about a


new database entry into all the places in the database where
information about that new entry needs to be stored.

In a properly normalized database, information about a new entry


needs to be inserted into only one place in the database.

In an inadequately normalized database, information about a new


entry may need to be inserted into more than one place.
Deletion anomalies
A "Deletion Anomaly" is a failure to remove information about
an existing database entry when it is time to remove that entry.

In an inadequately normalized database, information about that


old entry may need to be deleted from more than one place, and,
human fallibility being what it is, some of the needed additional
deletions may be missed.

Loss of related data when some other data is deleted


Updation anomalies
 Modification Anomaly: changing data in a row forces
changes to other rows because of duplication.
 Leads to data inconsistency .

 All three kinds of anomalies are highly undesirable.

 The main purpose of normalization is to reduce the chances


for anomalies to occur in a database
Purpose of normalization
The main goal of database normalization is to restructure the
logical data model of a database to:
Eliminate redundancy
Organize data efficiently
Reduce the potential for data anomalies
Cont’d…

There are well-accepted principles and rules that are used to


normalize a data model.

Types of normalizations are :


First normal form
Second normal form
Third normal form, BCNF.

A databases must be normalize to at least the level of the 3rd Normal


Form (3 NF).
First normal form(1NF)

oA table (relation) is in 1NF if


 There are no duplicated rows in the table.

Each cell is single-valued (i.e., there are no repeating


groups).
 Entries in a column (attribute, field) are of the same
kind.
Each column should have a unique name
Example: Look the following Employee table below
Cont’d…
Second Normal Form (2NF)

A table is in 2NF if it is in 1NF and if all non-key attributes


are dependent on all of the key attributes.

FD = x  y

If t1.x = t2.x then t1.y = t2.y


Functional dependency
Functional dependency is a relationship that exists when one
attribute uniquely determines another attributes.

For a relation R, attribute B is functionally dependent on attribute


A if for every valid instance of A, that value of A uniquely defines
the value of B.

If B is functionally dependent on A we write A  B

The attribute on the left-hand side of the arrow is called a


determinant.
Cont’d...
An attribute may be functionally dependent on two or more attributes

Example

The relation EMP_COURSE (Emp_ID, Course Title,


Date_Completed) can have its functional dependencies described as:
Emp_ID, Course_Title  Date_Completed

i.e., the date of course completed is completely determined by the identity of


the employee and the title of the course
Cont’d...
Other examples:
VIN  Make, Model, Colour, i.e. the make, model and colour
of a vehicle are functionally dependent on the vehicle
identification number

ISBN  Title, First_Author_Name, i.e. the title of a book and


the name of the first author are functionally dependent on the
book’s ISBN
Second normal form (2NF)
2NF = already in 1NF and every non key attribute is fully
functionally dependent on the primary key.

If 1NF will be in 2NF if any one of the following conditions


applies:
Cont’d...
The primary key consists of only one attribute(such as
Emp_ID in Employee)

thus all of the attributes in the relation are components of the


primary key

Every non key attribute is functionally dependent on the full


set of primary key attributes
Cont’d...
Employee is an example of a relation that is not in 2NF.

The primary key for this relation is the composite key Emp_ID
and Course_Title.

Therefore the non key attributes Name, Dept_Name and Salary


are functionally dependent on part of the primary key (Emp_ID)
but not on Course_Title.
Functional Dependencies in Employee

Dependency on entire primary key

EmpID CourseTitle Name DeptName Salary Date_Completed

Dependency on only part of the key

EmpID, CourseTitle  DateCompleted


EmpID  Name, DeptName, Salary

Therefore, NOT in 2nd Normal Form


Partial dependency
Partial functional dependency is a functional dependency in
which one or more nonkey attributes (such as Name) are
functionally dependent on part (but not all) of the primary key

In Employee this creates redundancy in that relation, which


results in anomalies when that table is updated (as noticed
previously: Insertion, Deletion and Modification anomalies)
Converting to 2NF
To do this we decompose the relation into new relations that
satisfy one (or more) of the conditions described above.
Employee relation is decomposed into the following 2 relations:

EMPLOYEE (Emp_ID, Name, Dept_Name, Salary). This


satisfies the first condition above and is in 2NF

EMP_COURSE(Emp_ID, Course_Title, Date_Completed)


This satisfies the third property above and is also in 2NF.
Cont’d…
Decomposed into two separate relations

EmpID Name DeptName Salary

EmpID CourseTitle DateCompleted

Both are full functional dependencies


Third normal form (3NF)
3NF = already in 2NF and no transitive dependencies exist.

Transitive dependency = a functional dependency between two (or more)


nonkey attributes.

E,g. SALES(Cust_ID, Name, Salesperson, Region) – Cust_ID the


primary key so that all the remaining attributes are functionally dependent
on this

However there is a transitive dependency, as Region is dependent on


Salesperson and Salesperson is dependent on Cust_ID

As a result there would be update anomalies for this .


Relation with transitive dependency
Relation with transitive dependency

CustID  Name
CustID  Salesperson BUT
CustID  Region
CustID  Salesperson  Region
All this is OK Transitive dependency
(2nd NF) (not 3rd NF)
Anomalies
Insertion anomaly because a new salesperson assigned to the
‘North Region’ cannot be entered until a customer has been
assigned, because a value for Cust_ID must be provided to insert a
row in the table
Deletion anomaly because if customer 6837 is deleted from the
table, we lose the information that salesperson Hernandez is
assigned to the ‘East Region’
Modification anomaly because if salesperson Smith is reassigned
to the ‘East Region’, several rows must be changed to reflect that
fact.
Removing transitive
dependencies
This can be done by decomposing SALES into two relations
(see following table.)

Salesperson, which is the determinant in the transitive


dependency in SALES, becomes the primary key in Sperson

Salesperson becomes a foreign key in SALES1

Also uses less storage space because the dependent data (such
as Region) do not have to repeat for each customer
Decomposing the SALES relation
Relations in 3NF

Salesperson  Region

CustID  Name
CustID  Salesperson

Now, there are no transitive dependencies, both relations are


in 3rd NF
BCNF
Boyce–Codd normal form (or BCNF or 3.5NF) is a normal form
used in database normalization.

It is a slightly stronger version of the third normal form (3NF).


Rules for BCNF
For a table to satisfy the Boyce-Codd Normal Form, it should satisfy
the following two conditions:
It should be in the Third Normal Form.
And, for any dependency A → B, A should be a super key.
it means, that for a dependency A → B, A cannot be a non-prime
attribute, if B is a prime attribute.
Example
student_id subject professor
101 Java P.Java
101 C++ P.Cpp
102 Java P.Java2
103 C# P.Chash
104 Java P.Java

Student id and subject together form PK. one professor teaches only
one subject, but one subject may have two different professors.

Hence, subject depends on the professor name.


To make this relation(table) satisfy BCNF, we will decompose this table
into two tables, student table and professor table.
Student Table

student_id p_id

And, Professor Table

p_id professor subject


Inference rules(Armstrong’s
Axioms)
The following six rules IR1 through IR6 are well-known inference rules for functional
dependencies:

IR1 (reflexive rule)1: If X ⊇ Y, then X →Y.

IR2 (augmentation rule)2: {X → Y} |=XZ → YZ.

IR3 (transitive rule): {X → Y, Y → Z} |=X → Z.

IR4 (decomposition, or projective rule): {X → YZ} |=X → Y.

IR5 (union, or additive rule): {X → Y, X → Z} |=X → YZ.

IR6 (pseudotransitive rule): {X → Y, WY → Z} |=WX → Z.


4NF
For a table to satisfy the Fourth Normal Form, it should satisfy the following
two conditions:
1. It should be in the Boyce-Codd Normal Form.
2. And, the table should not have any Multi-valued Dependency.
For a dependency A → B, if for a single value of A, multiple value of B
exists, then the table may have multi-valued dependency.
a table should have at-least 3 columns for it to have a multi-valued
dependency.
And, for a relation R(A,B,C), if there is a multi-valued dependency between,
A and B, then B and C should be independent of each other.
If all these conditions are true for any relation(table), it is said to have multi-
valued dependency.
Example
Below we have a college enrolment table with columns s_id, course
and hobby.
student with s_id 1 has opted for two courses, Science and Maths, and
has two hobbies, Cricket and Hockey.

s_id course hobby


1 Science Cricket
1 Maths Hockey
2 C# Cricket
2 Php Hockey
 The two records for student with s_id 1, will give rise to two more
records, as shown below, because for one student, two hobbies exists,
hence along with both the courses, these hobbies should be specified.

s_id course hobby


1 Science Cricket
1 Maths Hockey
1 Science Hockey
1 Maths Cricket
CourseOpted Table Hobbies Table
s_id course s_id hobby
1 Science 1 Cricket
1 Maths 1 Hockey
2 C# 2 Cricket
2 Php 2 Hockey
Attribute closure(closure set)
set of attributes which can be functionally determined from it.
Example: R(A,B,C,D,E)

FD: {A → B, B → C , C → D, D → E }

 A+ = {ABCDE}

AD + = {ADBEC}

B + = {BCDE}
Equivalence of Sets of Functional
Dependencies
 A set of functional dependencies F is said to cover another set of
functional dependencies E if every FD in E is also in F+; that is, if every
dependency in E can be inferred from F; alternatively, we can say that E is
covered by F.

Two sets of functional dependencies E and F are equivalent if E+ = F+.

Therefore, equivalence means that every FD in E can be inferred from F,


and every FD in F can be inferred from E; that is, E is equivalent to F if
both the conditions—E covers F and F covers E—hold.
Minimal Sets(canonical) of
Functional Dependencies
 F to be minimal if it satisfies the following conditions:

1. singleton RHS

2. No extraneous(two or more) attribute in LHS

3. Eliminate redundant
Example
 R(A,B,C,D,E)
FD = { A → D, BC → AD, C → B, E → A, E → D}
Step 1 singleton RHS
{ A → D, BC → A, BC → D, C → B, E → A, E → D}
Step 2 extraneous attribute in LHS
BC → A, BC → D (eliminate either B,C in the first and B,C in the second)

To eliminate first compute B and C closure using the above FD

{ A → D, BC → A, BC → D, C → B, E → A, E → D}

B+ = B , C+ = CBAD so the C closure already contains the B so we can


eliminate B(this means we can reach to B using the C closure)
Example
{ A → D, BC → A, BC → D, C → B, E → A, E → D} eliminate B

= { A → D, C → A, BC → D, C → B, E → A, E → D}

B+ = B , C+ = CBAD so the C closure already contains the B so we


can eliminate B(this means we can reach to B using the C closure)
once again

{ A → D, C → A, BC → D, C → B, E → A, E → D} eliminate B

= { A → D, C → A, C → D, C → B, E → A, E → D}
Example
FD = { A → D, C → A, C → D, C → B, E → A, E → D}

Step 3: starting from first FD finding the redundant one so we take A→D and if A+
contains the D then we can eliminate other wise we can’t. when computing A + assuming
we eliminate A → D we get = A , we can’t eliminate so this stays and move to the second
one.

{ A → D, C → A, C → D, C → B, E → A, E → D} compute C + assuming you eliminate


C → A , C + = CDB it doesn’t contain A so we need this.

{ A → D, C → A, C → D, C → B, E → A, E → D} , compute C +
assuming you
eliminate C → D, C + = CADB , now we can eliminate C → D

Applying this to all FD we get = { A → D, C → A, C → B, E → A}


Properties of Relational Decompositions

In a database, it breaks the table into multiple tables. If the relation has no proper
decomposition, then it may lead to problems like loss of information. Decomposition
is used to eliminate some of the problems of bad design like anomalies,
inconsistencies, and redundancy.

The properties of a relational decomposition are

1.Attribute Preservation: The attributes in R will appear in at least one relation


schema Ri in the decomposition, i.e., no attribute is lost.

2.Dependency preservation :If a decomposition is not dependency preserving some


dependency is lost in decomposition.

If R decomposed into R1,R2,R3 then all the F1,F2,F3 Union should cover FD in R
Cont’d

3. Non Additive Join Property: ensures that no spurious tuples are generated when
a NATURAL JOIN operation is applied to the relations resulting from the
decomposition.

4. No redundancy: Decomposition is used to eliminate some of the problems of bad


design like anomalies, inconsistencies, and redundancy. If the relation has no proper
decomposition, then it may lead to problems like loss of information.

5. Lossless join property: It is the ability to ensure that any instance of the original
relation can be identified from corresponding instances in the smaller relations.
Cont’d

We must carefully consider the problems associated with NULLs when designing a

relational database schema. There is no fully satisfactory relational design theory as

yet that includes NULL values.

A tuple that does not participate in a natural join is called a dangling tuple.
Dangling tuples may indicate a consistency problem in the database.
Example
R (A,B,C,D,E)
FD: {A → B, B → C, C → D, D → A}
Check if these two decompositions preserve FD. R1(A,B,C) and R2(C,D,E) First, find the FD set
for R1
A+ = ABCD (A →BC) Since D IS NOT IN R1 we eliminate D from FD1
B+ = BCDA (B →CA)
C+ = CDAB (C →AB)
AB+ = ABCD (AB →C)
BC+ = BCDA (BC →A) These two are already in the above 3 closure above so FD1 will be
{A →BC, B →CA, C →AB}
Following same process for R2 we get FD2 {C → D, D → C}
Example
Then FD1 U FD2 should be equivalent to FD
i.e FD {A → B, B → C, C → D, D → A} should be a member of
FD1 U FD2 {A →BC, B →CA, C →AB, C → D, D → C}
Therefore, check each FD exist in FD1 U FD2, All the first 3FD exist in the union relation. but what
about the fourth? It does not seem to exist directly so we will check if it exists indirectly by finding
out the determinant D closure
D+ = DCAB since A is a member in closure set of D then we can say D → A is member of FD1 U
FD2 {A →BC, B →CA, C →AB, C → D, D → C}
We can conclude that the decomposed relation preserve dependency.

You might also like