[go: up one dir, main page]

0% found this document useful (0 votes)
7 views38 pages

DBMS Module-4 Notes

This document covers the theory of database design, focusing on normalization techniques to eliminate data redundancy and anomalies in relational schemas. It outlines informal design guidelines, functional dependencies, and various normal forms, including 4NF and 5NF, to ensure efficient database structure. The document also discusses issues related to NULL values, spurious tuples, and provides algorithms for relational database schema design.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views38 pages

DBMS Module-4 Notes

This document covers the theory of database design, focusing on normalization techniques to eliminate data redundancy and anomalies in relational schemas. It outlines informal design guidelines, functional dependencies, and various normal forms, including 4NF and 5NF, to ensure efficient database structure. The document also discusses issues related to NULL values, spurious tuples, and provides algorithms for relational database schema design.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 38

[21CS53]

Module 4
Chapter 1: Database Design Theory
4.0 Introduction
4.1 Objective
4.2 Introduction to DB design
4.3 Informal Design Guidelines for Relation Schemas
4.3.1 Imparting Clear Semantics to Attributes in Relations
4.3.2 Redundant Information in Tuples and Update Anomalies
4.3.3 NULL Values in Tuples
4.3.4 Generation of Spurious Tuples
4.4 Functional Dependencies
4.4.1 Normalization of Relations
4.4.2 Practical Use of Normal Forms
4.4.3 Definitions of Keys and Attributes Participating in Keys
4.4.4 First Normal Form
4.4.5 Second Normal Form
4.4.6 Third Normal Form
4.5 General Definition of Second and Third Normal Form
4.6 Boyce-Codd Normal Form
4.7 Multivalued Dependency and Fourth Normal Form
4.7.1 Formal Definition of Multivalued Dependency
4.8 Join Dependencies and Fifth Normal Form
4.9 Inference Rules for Functional Dependencies
4.10 Equivalence of Sets of Functional Dependencies
4.11 Sets of Functional Dependencies
4.12 Properties of Relational Decompositions
4.13 Algorithms for Relational Database Schema Design
4.13.1 Dependency-Preserving and Nonadditive (Lossless) Join Decomposition into 3NF
Schemas
4.13.2 Nonadditive Join Decomposition into BCNF Schemas
4.13.3 Dependency-Preserving and Nonadditive (Lossless) Join Decomposition into 3NF
Schemas
4.14 About Nulls, Dangling Tuples, and Alternative Relational Designs
4.14.1 Problems with NULL Values and Dangling Tuples
4.15 Other Dependencies and Normal Forms
4.15.1 Inclusion Dependencies
4.15.2 Template Dependencies
4.15.3 Functional Dependencies Based on Arithmetic Functions and Procedures
4.15.4 Domain-Key Normal Form
4.16 Assignment Questions
4.17 Expected Outcome
4.18 Further Reading

https://vtucode.in
[21CS53]

4.0 Introduction

Database Normalization is a technique of organizing the data in the database.


Normalization is a systematic approach of decomposing tables to eliminate data
redundancy and undesirable characteristics like Insertion, Update and Deletion Anomalies.
It is a multi-step process that puts data into tabular form by removing duplicated data from
the relation tables. This module discuss the basic and higher normal forms.

4.1 Objectives

To study the process of normalization and refine the database design


To normalize the tables upto 4NF and 5NF
To study lossless and lossy join operations
To study inference rules
To study other dependencies and Normal Forms.

4.2 Introduction to DB design

Each relation schema consists of a number of attributes, and the relational database schema
consists of a number of relation schemas. So far, we have assumed that attributes are grouped
to form a relation schema by using the common sense of the database designer or by mapping
a database schema design from a conceptual data model such as the ER or Enhanced-ER
(EER) data model. These models make the designer identify entity types and relationship types
and their respective attributes, which leads to a natural and logical grouping of the attributes into
relations.
Database Design There are two levels at which we
can discuss the goodness of relation schemas:

1. The logical (or conceptual) level how users interpret the relation schemas and the
meaning of their attributes.
2. The implementation (or physical storage) level how the tuples in a base relation are
stored and updated. This level applies only to schemas of base relations
An Example
STUDENT relation with attributes: studName, rollNo, gender, studDept
DEPARTMENT relation with attributes: deptName, officePhone, hod
Several students belong to a department

Correct schema:

Student Department
[21CS53]

Incorrect schema:
Studdept

Problems with bad schema


Redundant storage of data:
- Office Phone & HOD info -stored redundantly once with each student that
belongs to the department
- wastage of disk space
A program that updates Office Phone of a department
- must change it at several places
- more running time
- error -prone
4.3 Informal Design Guidelines for Relation Schemas
Four informal guidelines that may be used as measures to determine the quality of
relation schema design:
1. Making sure that the semantics of the attributes is clear in the schema
2. Reducing the redundant information in tuples
3. Reducing the NULL values in tuples
4. Disallowing the possibility of generating spurious tuples
These measures are not always independent of one another
4.3.1 Imparting Clear Semantics to Attributes in Relations
semantics of a relation refers to its meaning resulting from the interpretation of attribute
values in a tuple
Whenever we group attributes to form a relation schema, we assume that attributes
belonging to one relation have certain real-world meaning and a proper interpretation
associated with them
The easier it is to explain the semantics of the relation, the better the relation schema
design will be
Guideline 1
Design a relation schema so that it is easy to explain its meaning
Do not combine attributes from multiple entity types and relationship types into a single
relation

https://vtucode.in
[21CS53]

if a relation schema corresponds to one entity type or one relationship type, it is


straightforward to interpret and to explain its meaning
if the relation corresponds to a mixture of multiple entities and relationships,
semantic ambiguities will result and the relation cannot be easily explained.

Examples of Violating Guideline 1

Fig: schema diagram for company database


Both the relation schemas have clear semantics
A tuple in the EMP_DEPT relation schema represents a single employee but includes
additional information the name (Dname) of the department for which the employee
works and the Social Security number (Dmgr_ssn) of the department manager.
A tuple in the EMP_PROJ relates an employee to a project but also includes the
employee name (Ename), project name (Pname), and project location (Plocation)
logically correct but they violate Guideline 1 by mixing attributes from distinct real-world
entities:
EMP_DEPT mixes attributes of employees and departments
EMP_PROJ mixes attributes of employees and projects and the WORKS_ON
relationship
They may be used as views, but they cause problems when used as base relations

4.3.2 Redundant Information in Tuples and Update Anomalies


One goal of schema design is to minimize the storage space used by the base relations
Grouping attributes into relation schemas has a significant effect on storage space
For example, compare the space used by the two base relations EMPLOYEE and
DEPARTMENT with that for an EMP_DEPT base relation
In EMP_DEPT, the attribute values pertaining to a particular department (Dnumber,
Dname, Dmgr_ssn) are repeated for every employee who works for that department

https://vtucode.in
[21CS53]

relation. Only the department number Dnumber is repeated in the EMPLOYEE relation
for each employee who works in that department as a foreign key

Figure 1: One possible database state for the COMPANY relational database schema

https://vtucode.in
[21CS53]

Figure 1 : One possible database state for the COMPANY relational database schema

Fig: Sample states for EMP_DEPT and EMP_PROJ resulting from applying NATURAL JOIN to the relations in Figure 1

https://vtucode.in
[21CS53]

Storing natural joins of base relations leads to an additional problem referred to as update
anomalies. These can be classified into:
insertion anomalies
deletion anomalies,
modification anomalies
Insertion Anomalies
Insertion anomalies can be differentiated into two types, illustrated by the following
examples based on the EMP_DEPT relation:

1. To insert a new employee tuple into EMP_DEPT, we must include either the attribute
values for the department that the employee works for, or NULLs
- For example, to insert a new tuple for an employee who works in department number
5, we must enter all the attribute values of department 5 correctly so that they are
consistent with the corresponding values for department 5 in other tuples in
EMP_DEPT
- In the design of Employee in fig 1, we do not have to worry about this consistency
problem because we enter only the department number in the employee tuple; all
other attribute values of department 5 are recorded only once in the database, as a
single tuple in the DEPARTMENT relation
2. It is difficult to insert a new department that has no employees as yet in the EMP_DEPT
relation. The only way to do this is to place NULL values in the attributes for employee

- This violates the entity integrity for EMP_DEPT because Ssn is its primary key
- This problem does not occur in the design of Figure 1 because a department is
entered in the DEPARTMENT relation whether or not any employees work for it,
and whenever an employee is assigned to that department, a corresponding tuple
is inserted in EMPLOYEE.
Deletion Anomalies

The problem of deletion anomalies is related to the second insertion anomaly situation
just discussed
- If we delete from EMP_DEPT an employee tuple that happens to represent the
last employee working for a particular department, the information concerning that
department is lost from the database
- This problem does not occur in the database of Figure 2 because DEPARTMENT
tuples are stored separately.

https://vtucode.in
[21CS53]

Modification Anomalies
In EMP_DEPT, if we change the value of one of the attributes of a particular
department say, the manager of department 5 we must update the tuples of all
employees who work in that department; otherwise, the database will become
inconsistent
If we fail to update some tuples, the same department will be shown to have two
different values for manager in different employee tuples, which would be wrong

Guideline 2
Design the base relation schemas so that no insertion, deletion, or modification
anomalies are present in the relations
If any anomalies are present, note them clearly and make sure that the programs that
update the database will operate correctly
The second guideline is consistent with and, in a way, a restatement of the first guideline
These guidelines may sometimes have to be violated in order to improve the
performance of certain queries.
4.3.3 NULL Values in Tuples
If many of the attributes do not apply to all tuples in the relation, we end up with many
NULLs in those tuples
- this can waste space at the storage level
- may lead to problems with understanding the meaning of the attributes
- may also lead to problems with specifying JOIN operations
- how to account for them when aggregate operations such as COUNT or SUM are
applied
SELECT and JOIN operations involve comparisons; if NULL values are present, the
results may become unpredictable.
Moreover, NULLs can have multiple interpretations, such as the following:
The attribute does not apply to this tuple. For example, Visa_status may not apply
to U.S. students.
The attribute value for this tuple is unknown. For example, the Date_of_birth may
be unknown for an employee.
The value is known but absent; that is, it has not been recorded yet. For example,
the Home_Phone_Number for an employee may exist, but may not be available
and recorded yet.

Guideline 3

https://vtucode.in
[21CS53]

As far as possible, avoid placing attributes in a base relation whose values may
frequently be NULL
If NULLs are unavoidable, make sure that they apply in exceptional cases only and do
not apply to a majority of tuples in the relation
Using space efficiently and avoiding joins with NULL values are the two overriding
criteria that determine whether to include the columns that may have NULLs in a relation
or to have a separate relation for those columns with the appropriate key columns

For example, if only 15 percent of employees have individual offices,there is little


justification for including an attribute Office_number in the EMPLOYEE relation; rather,
a relation EMP_OFFICES(Essn, Office_number) can be created to include tuples for
only the employees with individual offices.

4.3.4 Generation of Spurious Tuples

Consider the two relation schemas EMP_LOCS and EMP_PROJ1 which can be used
instead of the single EMP_PROJ

A tuple in EMP_LOCS means that the employee whose name is Ename works on some
project whose location is Plocation

A tuple in EMP_PROJ1 refers to the fact that the employee whose Social Security
number is Ssn works Hours per week on the project whose name, number, and location
are Pname, Pnumber, and Plocation.

https://vtucode.in
[21CS53]

Suppose that we used EMP_PROJ1 and EMP_LOCS as the base relations instead of
EMP_PROJ. This produces a particularly bad schema design because we cannot
recover the information that was originally in EMP_PROJ from EMP_PROJ1 and
EMP_LOCS

If we attempt a NATURAL JOIN operation on EMP_PROJ1 and EMP_LOCS, the result


produces many more tuples than the original set of tuples in EMP_PROJ
Additional tuples that were not in EMP_PROJ are called spurious tuples because they
represent spurious information that is not valid.
The spurious tuples are marked by asterisks (*)

Decomposing EMP_PROJ into EMP_LOCS and EMP_PROJ1 is undesirable because


when we JOIN them back using NATURAL JOIN, we do not get the correct original
information

https://vtucode.in
[21CS53]

This is because in this case Plocation is the attribute that relates EMP_LOCS and
EMP_PROJ1, and Plocation is neither a primary key nor a foreign key in either
EMP_LOCS or EMP_PROJ1.
Guideline 4
Design relation schemas so that they can be joined with equality conditions on attributes
that are appropriately related (primary key, foreign key) pairs in a way that guarantees
that no spurious tuples are generated
Avoid relations that contain matching attributes that are not (foreign key, primary key)
combinations because joining on such attributes may produce spurious tuples.

4.4 Functional Dependencies


Formal tool for analysis of relational schemas that enables us to detect and describe
some of the problems in precise terms
Definition of Functional Dependency
A functional dependency is a constraint between two sets of attributes from the
database.
Given a relation R, a set of attributes X in R is said to functionally determine another
attribute Y, also in R, (written X Y) if and only if each X value is associated with at
most one Y value.
X is the determinant set and Y is the dependent attribute. Thus, given a tuple and the
values of the attributes in X, one can determine the corresponding value of the Y
attribute.
The abbreviation for functional dependency is FD or f.d. The set of attributes X is called
the left-hand side of the FD, and Y is called the right-hand side.
A functional dependency is a property of the semantics or meaning of the attributes.
The database designers will use their understanding of the semantics of the attributes of
R to specify the functional dependencies that should hold on all relation states
(extensions) r of R.

Consider the relation schema EMP_PROJ;

From the semantics of the attributes and the relation, we know that the following
functional dependencies should hold:

https://vtucode.in
[21CS53]

Plocation}

These functional dependencies specify that

determines the employee name (Ename)


uniquely determines
the project name (Pname) and location (Plocation), and
(c) a combination of Ssn and Pnumber values uniquely determines
the number of hours the employee currently works on the
project per week (Hours).
Alternatively,we say that Ename is functionally determined by (or functionally dependent
on) Ssn, or given a value of Ssn, we know the value of Ename, and so on.
Relation extensions r(R) that satisfy the functional dependency constraints are called
legal relation states (or legal extensions) of R
A functional dependency is a property of the relation schema R, not of a particular legal
relation state r of R
Therefore, an FD cannot be inferred automatically from a given relation extension r but
must be defined explicitly by someone who knows the semantics of the attributes of R
Diagrammatic notation for displaying FDs
Each FD is displayed as a horizontal line
The left-hand-side attributes of the FD are connected by vertical lines to the line
representing the FD
The right-hand-side attributes are connected by the lines with arrows pointing toward the
attributes.

Fig: diagrammatic notation for displaying FDs

Example:

https://vtucode.in
[21CS53]

The following FDs may hold because the four tuples in the current extension have no
violation of these constraints:
B C
C B
{A, B C
{A, B D
{C, D B.
The following do not hold because we already have violations of them in the given
extension:
A B (tuples 1 and 2 violate this constraint)
B A (tuples 2 and 3 violate this constraint)
D C (tuples 3 and 4 violate it)

Normal Forms Based on Primary Keys

We assume that a

Set of functional dependencies is given for each relation


Each relation has a designated primary key
This information combined with the tests (conditions) for normal forms drives the
normalization process for relational schema design
First three normal forms for relation takes into account all candidate keys of a
relation rather than the primary key

4.4.1 Normalization of Relations


The normalization process, as first proposed by Codd (1972a), takes a relation schema
through a series of tests to certify whether it satisfies a certain normal form.
Initially, Codd proposed three normal forms, which he called first, second, and third
normal form
All these normal forms are based on a single analytical tool: the functional dependencies
among the attributes of a relation
A fourth normal form (4NF) and a fifth normal form (5NF) were proposed, based on the
concepts of multivalued dependencies and join dependencies, respectively
Normalization of data can be considered a process of analyzing the given relation
schemas based on their FDs and primary keys to achieve the desirable properties of
(1) minimizing redundancy and
(2) minimizing the insertion, deletion, and update anomalies

https://vtucode.in
[21CS53]

design have
successively better quality
Unsatisfactory relation schemas that do not meet certain conditions the normal form
tests are decomposed into smaller relation schemas that meet the tests and hence
possess the desirable properties.
Thus, the normalization procedure provides database designers with the following:
A formal framework for analyzing relation schemas based on their keys and
on the functional dependencies among their attributes
A series of normal form tests that can be carried out on individual relation
schemas so that the relational database can be normalized to any desired
degree

Definition: The normal form of a relation refers to the highest normal form condition that
it meets, and hence indicates the degree to which it has been normalized

4.4.2 Practical Use of Normal Forms


Normalization is carried out in practice so that the resulting designs are of high quality
and meet the desirable properties

Database design as practiced in industry today pays particular attention to normalization


only up to 3NF, BCNF, or at most 4NF.

The database designers need not normalize to the highest possible normal form

Relations may be left in a lower normalization status, such as 2NF, for performance
reasons

Definition: Denormalization is the process of storing the join of higher normal form
relations as a base relation, which is in a lower normal form.

4.4.3 Definitions of Keys and Attributes Participating in Keys


Superkey: specifies a uniqueness constraint that no two distinct tuples in any state r
of R can have the same value
key K is a superkey with the additional property that removal of any attribute from K will
cause K not to be a superkey any more
Example:
The attribute set {Ssn} is a key because no two employees tuples can have the same
value for Ssn

https://vtucode.in
[21CS53]

Any set of attributes that includes Ssn for example, {Ssn, Name, Address} is a
superkey
If a relation schema has more than one key, each is called a candidate key
One of the candidate keys is arbitrarily designated to be the primary key, and the others
are called secondary keys
In a practical relational database, each relation schema must have a primary key
If no candidate key is known for a relation, the entire relation can be treated as a default
superkey
For example {Ssn} is the only candidate key for EMPLOYEE, so it is also the primary
key
Definition. An attribute of relation schema R is called a prime attribute of R if it is a
member of some candidate key of R. An attribute is called nonprime if it is not a prime
attribute that is, if it is not a member of any candidate key

In WORKS_ON relation Both Ssn and Pnumber are prime attributes whereas other
attributes are nonprime.

4.4.4 First Normal Form


Defined to disallow multivalued attributes, composite attributes, and their combinations
It states that the domain of an attribute must include only atomic (simple, indivisible)
values and that the value of any attribute in a tuple must be a single value from the
domain of that attribute
1NF disallows relations within relations or relations as attribute values within tuples
The only attribute values permitted by 1NF are single atomic (or indivisible) values.
Consider the DEPARTMENT relation schema shown in Figure below

Primary key is Dnumber


We assume that each department can have a number of locations

https://vtucode.in
[21CS53]

The DEPARTMENT schema and a sample relation state are shown in Figure below

As we can see, this is not in 1NF because Dlocations is not an atomic attribute, as
illustrated by the first tuple in Figure
There are two ways we can look at the Dlocations attribute:
The domain of Dlocations contains atomic values, but some tuples can have a set of
these values. In this case, Dlocations is not functionally dependent on the primary key
Dnumber
The domain of Dlocations contains sets of values and hence is nonatomic. In this

attribute domain
In either case, the DEPARTMENT relation is not in 1NF

There are three main techniques to achieve first normal form for such a relation:

1. Remove the attribute Dlocations that violates 1NF and place it in a separate relation
DEPT_LOCATIONS along with the primary key Dnumber of DEPARTMENT. The
primary key of this relation is the combination {Dnumber, Dlocation}. A distinct tuple
in DEPT_LOCATIONS exists for each location of a department. This decomposes
the non-1NF relation into two 1NF relations.
2. Expand the key so that there will be a separate tuple in the original DEPARTMENT
relation for each location of a DEPARTMENT. In this case, the primary key
becomes the combination {Dnumber, Dlocation}. This solution has the disadvantage
of introducing redundancy in the relation

https://vtucode.in
[21CS53]

3. If a maximum number of values is known for the attribute for example, if it is


known that at most three locations can exist for a department replace the
Dlocations attribute by three atomic attributes: Dlocation1, Dlocation2, and
Dlocation3. This solution has the disadvantage of introducing NULL values if
most departments have fewer than three locations. Querying on this attribute
becomes more difficult; forexample, consider how you would write the query: List
the departments design.
Of the three solutions, the first is generally considered best because it does not suffer
from redundancy and it is completely general, having no limit placed on a maximum
number of values
First normal form also disallows multivalued attributes that are themselves composite.
These are called nested relations because each tuple can have a relation within it.

Figure above shows how the EMP_PROJ relation could appear if nesting is allowed
Each tuple represents an employee entity, and a relation PROJS(Pnumber, Hours)
within each tuple
employee works on each project.
The schema of this EMP_PROJ relation can be represented as follows:
EMP_PROJ(Ssn, Ename, {PROJS(Pnumber, Hours)})
Ssn is the primary key of the EMP_PROJ relation and Pnumber is the partial key of the
nested relation; that is, within each tuple, the nested relation must have unique values of
Pnumber
To normalize this into 1NF, we remove the nested relation attributes into a new relation
and propagate the primary key into it; the primary key of the new relation will combine
the partial key with the primary key of the original relation
Decomposition and primary key propagation yield the schemas EMP_PROJ1 and
EMP_PROJ2,

https://vtucode.in
[21CS53]

4.4.5 Second Normal Form


Second normal form (2NF) is based on the concept of full functional dependency
full functional dependency if removal of any
attribute A from X means that the dependency does not hold any more; that is, for any
{A}) does not functionally determine Y
partial dependency
removed from X and the de

Definition. A relation schema R is in 2NF if every nonprime attribute A in R is fully


functionally dependent on the primary key of R

The test for 2NF involves testing for functional dependencies whose left-hand side
attributes are part of the primary key
If the primary key contains a single attribute, the test need not be applied at all
https://vtucode.in
[21CS53]

The EMP_PROJ relation is in 1NF but is not in 2NF.


The nonprime attribute Ename violates 2NF because of FD2, as do the nonprime
attributes Pname and Plocation because of FD3
The functional dependencies FD2 and FD3 make Ename, Pname, and Plocation
partially dependent on the primary key {Ssn, Pnumber} of EMP_PROJ, thus violating the
2NF test.
If a relation schema is not in 2NF, it can be second normalized or 2NF normalized into a
number of 2NF relations in which nonprime attributes are associated only with the
part of the primary key on which they are fully functionally dependent.
Therefore, the functional dependencies FD1, FD2, and FD3 lead to the decomposition of
EMP_PROJ into the three relation schemas EP1, EP2, and EP3 shown in Figure below,
each of which is in 2NF.

4.4.6 Third Normal Form


Transitive functional dependency
A functional dependency X Y in a relation schema R is a transitive dependency if
there exists a set of attribute Z that are neither a primary nor a subset of any key of
R(candidate key) and both X Z and Y Z holds
Example:

SSN DMGRSSN is a transitive FD since SSN DNUMBER and DNUMBER


DMGRSSN hold

https://vtucode.in
[21CS53]

Dnumber is neither a key itself nor a subset of the key of EMP_DEPT


SSN ENAME is non-transitive since there is no set of attributes X where
SSN X and X ENAME
Definition: A relation schema R is in third normal form (3NF) if it is in 2NF and no
non-prime attribute A in R is transitively dependent on the primary key
The relation schema EMP_DEPT is in 2NF, since no partial dependencies on a key
exist. However, EMP_DEPT is not in 3NF because of the transitive dependency of
Dmgr_ssn (and also Dname) on Ssn via Dnumber

We can normalize EMP_DEPT by decomposing it into the two 3NF relation schemas
ED1 and ED2

ED1 and ED2 represent independent entity facts about employees and departments
A NATURAL JOIN operation on ED1 and ED2 will recover the original relation
EMP_DEPT without generating spurious tuples
Problematic FD
Left-hand side is part of primary key
Left-hand side is a non-key attribute
2NF and 3NF normalization remove these problem FDs by decomposing the original
relation into new relations
In general, we want to design our relation schemas so that they have neither partial nor
transitive dependencies because these types of dependencies cause the update
anomalies

https://vtucode.in
[21CS53]

https://vtucode.in
[21CS53]

4.5 General Definition of Second and Third Normal Form


Takes into account all candidate keys of a relation into account
Definition of 2NF: A relation schema R is in second normal form (2NF) if every
nonprime attribute A in R is not partially dependent on any key of R
Consider the relation schema LOTS which describes parcels of land for sale in various
counties of a state
Suppose that there are two candidate keys: Property_id# and {County_name, Lot#}; that
is, lot numbers are unique only within each county, but Property_id# numbers are unique
across counties for the entire state.

Based on the two candidate keys Property_id# and {County_name, Lot#}, the functional
dependencies FD1 and FD2 hold

We choose Property_id# as the primary key, but no special consideration will be given to
this key over the other candidate key
FD3 says that the tax rate is fixed for a given county (does not vary lot by lot within the
same county)
FD4 says that the price of a lot is determined by its area regardless of which county it is
in.
The LOTS relation schema violates the general definition of 2NF because Tax_rate is
partially dependent on the candidate key {County_name, Lot#}, due to FD3
To normalize LOTS into 2NF, we decompose it into the two relations LOTS1 and LOTS2

https://vtucode.in
[21CS53]

We construct LOTS1 by removing the attribute Tax_rate that violates 2NF from LOTS
and placing it with County_name (the left-hand side of FD3 that causes the partial
dependency) into another relation LOTS2.
Both LOTS1 and LOTS2 are in 2NF.

Definition of 3NF: A relation schema R is in third normal form (3NF) if, whenever a

A is a prime attribute of R
According to this definition, LOTS2 is in 3NF
FD4 in LOTS1 violates 3NF because Area is not a superkey and Price is not a prime
attribute in LOTS1
To normalize LOTS1 into 3NF, we decompose it into the relation schemas LOTS1A and
LOTS1B

We construct LOTS1A by removing the attribute Price that violates 3NF from LOTS1 and
placing it with Area (the lefthand side of FD4 that causes the transitive dependency) into
another relation LOTS1B.

Both LOTS1A and LOTS1B are in 3NF

https://vtucode.in
[21CS53]

4.6 Boyce-Codd Normal Form


Boyce-Codd normal form (BCNF) was proposed as a simpler form of 3NF, but it was
found to be stricter than 3NF
Every relation in BCNF is also in 3NF; however, a relation in 3NF is not necessarily in
BCNF
Definition. A relation schema R is in BCNF if whenever a nontrivial functional

The formal definition of BCNF differs from the definition of 3NF in that condition (b) of
3NF, which allows A to be prime, is absent from BCNF. That makes BCNF a stronger
normal form compared to 3NF
In our example, FD5 violates BCNF in LOTS1A because AREA is not a superkey of
LOTS1A
FD5 satisfies 3NF in LOTS1A because County_name is a prime attribute (condition b),
but this condition does not exist in the definition of BCNF
We can decompose LOTS1A into two BCNF relations LOTS1AX and LOTS1AY. This
decomposition loses the functional dependency FD2 because its attributes no longer
coexist in the same relation after decomposition.

In practice, most relation schemas that are in 3NF are also in BCNF
Only if X A holds in a relation schema R with X not being a superkey and A being a
prime attribute will R be in 3NF but not in BCNF
Example: consider the relation TEACH with the following dependencies:

https://vtucode.in
[21CS53]

-- means that each instructor teaches one course


{Student, Course} is a candidate key for this relation
The dependencies shown follow the pattern in Figure below with Student as A, Course
as B, and Instructor as C

Hence this relation is in 3NF but not BCNF


Decomposition of this relation schema into two schemas is not straightforward because
it may be decomposed into one of the three following possible pairs:
1. R1(Student, Instructor) and R2(Student, Course)
2. R1(Course, Instructor) and R2(Course, Student)
3. R1(Instructor, Course)and R2(Instructor, Student)
It is generally not sufficient to check separately that each relation schema in the
database is, say, in BCNF or 3NF
Rather, the process of normalization through decomposition must also confirm the
existence of additional properties that the relational schemas, taken together, should
possess. These would include two properties:
The nonadditive join or lossless join property, which guarantees that the spurious
tuple generation problem does not occur with respect to the relation schemas
created after decomposition.
The dependency preservation property, which ensures that each functional
dependency is represented in some individual relation resulting after decomposition.

https://vtucode.in
[21CS53]

We are not able to meet the functional dependency preservation ,but we must meet the
non additive join property
Nonadditive Join Test for Binary Decomposition:
A decomposition D={R1, R2} of R has the lossless join property with respect to a set
of functional dependencies F on R if and only if either
The FD ((R1 R2 R1-R2)is in F+ or
The FD ((R1 R2 R2-R1)is in F+
The third decomposition meets the test
R1 R2 is Instructor
R1-R2 is Course
Hence, the proper decomposition of TEACH into BCNF relations is:
TEACH1(Instructor,Course) and TEACH2(Instructor,Student)
In general, a relation R not in BCNF can be decomposed so as to meet the nonadditive
join prorperty by the following procedure. It decomposes R successively into set of
relations that are in BCNF:

causes violation of BCNF. R may be decomposed into two relations:


R A
XA
If either R-A or XA is not in BCNF, repeat the process
4.7 Multivalued Dependency and Fourth Normal Form
For example, consider the relation EMP shown in Figure below:

A tuple in this EMP relation represents the fact that an employee whose name is
Ename works on the project whose name is Pname and has a dependent whose
name is Dname
An employee may work on several projects and may have several dependents

https://vtucode.in
Database Management System [18CS53]
[21CS53]

Chapter 3: Database Application Development

2.1 Introduction
We often encounter a situations in which we need the greater flexibility of a general-purpose
programming language in addition to the data manipulation facilities provided by SQL.For example,
we may want to integrate a database applications with GUI or we may want to integrate with other
existing applications.

2.2 Accessing Databases from applications


SQL commands can be executed from within a program in a host language such as C or Java. A
language to which SQL queries are embedded are called Host language.

2.2.1 Embedded SQL

The use of SQL commands within a host language is called Embedded SQL. Conceptually,
embedding SQL commands in a host language program is straight forward. SQL statements can be
used wherever a statement in the host language is allowed. SQL statements must be clearly
marked so that a preprocessor can deal with them before invoking the compiler for the host
language. Any host language variable used to pass arguments into an SQL command must be
declared in SQL.
There are two complications:
1. Data types recognized by SQL may not be recognized by the host language and vice versa
- This mismatch is addressed by casting data values appropriately before passing them to or
from SQL commands.
2. SQL is set-oriented
- Addressed using cursors

Declaring Variables and Exceptions


SQL statements can refer to variables defined in the host program. Such host language variables
must be prefixed by a colon(:) in SQL statements and be declared between the commands

EXEC SQL BEGIN DECLARE SECTION and EXEC SQL END DECLARE SECTION

https://vtucode.in page 45
Database Management System [18CS53]
[21CS53]

The declarations are similar to C, are separated by semicolons. For example, we can declare
variables c_sname, c_sid, c_rating, and c_age (with the initial c used as a naming convention to
emphasize that these are host language variables) as follows:
EXEC SQL BEGIN DECLARE SECTION

char c_sname[20];
long c_sid;
short c_rating;
float c_age;
EXEC SQL END DECLARE SECTION

The first question that arises is which SQL types correspond to the various C types, since we have
just declared a collection of C variables whose values are intended to be read (and possibly set) in
an SQL run-time environment when an SQL statement that refers to them is executed. The SQL-92
standard defines such a correspondence between the host language types and SQL types for a
number of host languages. In our example, c_sname has the type CHARACTER(20) when referred
to in an SQL statement, c_sid has the type INTEGER, crating has the type SMALLINT, and c_age
has the type REAL.

We also need some way for SQL to report what went wrong if an error condition arises when
executing an SQL statement. The SQL-92 standard recognizes two special variables for reporting
errors, SQLCODE and SQLSTATE.
SQLCODE is the older of the two and is defined to return some negative value when an
error condition arises, without specifying further just what error a particular negative
integer denotes.
SQLSTATE, introduced in the SQL-92 standard for the first time, associates predefined
values with several common error conditions, thereby introducing some uniformity to how
errors are reported.
One of these two variables must be declared. The appropriate C type for SQLCODE is long and the
appropriate C type for SQLSTATE is char [6] , that is, a character string five characters long.

Embedding SQL statements


All SQL statements embedded within a host program must be clearly marked with the details
dependent on the host language. In C, SQL statements must be prefixed by EXEC SQL. An SQL
statement can essentially appear in any place in the host language program where a host language
statement can appear.
Example: The following embedded SQL statement inserts a row, whose column values are based
on the values of the host language variables contained in it, into the sailors relation
EXEC SQL INSERT INTO sailors VALUES (:c_sname, :c_sid, :c_rating,:c_age);

https://vtucode.in page 46
Database Management System [18CS53]
[21CS53]

The SQLSTATE variable should be checked for errors and exceptions after each Embedded SQL
statement.SQL provides the WHENEVER command to simplify this task:
EXEC SQL WHENEVER [SQLERROR | NOT FOUND ] [CONTINUE|GOTO stmt]
If SQLERROR is specified and the value of SQLSTATE indicates an exception, control is
transferred to stmt, which is presumably responsible for error and exception handling. Control is
also transferred to stmt if NOT FOUND is specified and the value of SQLSTATE is 02000, which
denotes NO DATA.

2.2.2 Cursors
A major problem in embedding SQL statements in a host language like C is that an impedance
mismatch occurs because SQL operates on sets of records, whereas languages like C do not
cleanly support a set-of-records abstraction. The solution is to essentially provide a mechanism that
allows us to retrieve rows one at a time from a relation- this mechanism is called a cursor
We can declare a cursor on any relation or on any SQL query. Once a cursor is declared, we can
open it (positions the cursor just before the first row)
Fetch the next row
Move the cursor (to the next row,to the row after the next n, to the first row or previous row
etc by specifying additional parameters for the fetch command)
Close the cursor
Cursor allows us to retrieve the rows in a table by positioning the cursor at a particular row and
reading its contents.
Basic Cursor Definition and Usage
Cursors enable us to examine, in the host language program, a collection of rows computed by an
Embedded SQL statement:
We usually need to open a cursor if the embedded statement is a SELECT. we can avoid
opening a cursor if the answer contains a single row
INSERT, DELETE and UPDATE statements require no cursor. some variants of DELETE
and UPDATE use a cursor.
Examples:
i) Find the name and age of a sailor, specified by assigning a value to the host variable c_sid,
declared earlier
EXEC SQL SELECT s.sname,s.age
INTO :c_sname, :c_age
FROM Sailaor s
WHERE s.sid=:c.sid;

https://vtucode.in page 47
Database Management System [18CS53]
[21CS53]

The INTO clause allows us assign the columns of the single answer row to the host variable
c_sname and c_age. Therefore, we do not need a cursor to embed this query in a host language
program.

ii) Compute the name and ages of all sailors with a rating greater than the current value of the host
variable c_minrating
SELECT s.sname,s.age
FROM sailors s WHERE s.rating>:c_minrating;
The query returns a collection of rows. The INTO clause is inadequate. The solution is to use a
cursor:
DECLARE sinfo CURSOR FOR
SELECT s.sname,s.age
FROM sailors s
WHERE s.rating>:c_minrating;
This code can be included in a C program and once it is executed, the cursor sinfo is defined.
We can open the cursor by using the syntax:
OPEN sinfo;

with it.When the cursor is opened, it is positioned just before the first row.
We can use the FETCH command to read the first row of cursor sinfo into host language variables:
FETCH sinfo INTO :c_sname, :c_age;
When the FETCH statement is executed, the cursor is positioned to point at the next row and the
column values in the row are copied into the corresponding host variables. By repeatedly executing
this FETCH statement, we can read all the rows computed by the query, one row at time.
When we are done with a cursor, we can close it:
CLOSE sinfo;
iii) To retrieve the name, address and salary of an employee specified by the variable ssn

https://vtucode.in page 48
Database Management System [18CS53]
[21CS53]

Properties of Cursors
The general form of a cursor declaration is:
DECLARE cursorname [INSENSITIVE] [SCROLL] CURSOR
[WITH HOLD]
FOR some query
[ORDER BY order-item-list ]
[FOR READ ONLY I FOR UPDATE ]
A cursor can be declared to be a read-only cursor (FOR READ ONLY) or updatable cursor (FOR
UPDATE).If it is updatable, simple variants of the UPDATE and DELETE commands allow us to
update or delete the row on which the cursor is positioned. For example, if sinfo is an updatable
cursor and open, we can execute the following statement:
UPDATE Sailors S
SET S.rating = S.rating -1
WHERE CURRENT of sinfo;
A cursor is updatable by default unless it is a scrollable or insensitive cursor in which case it is read-
only by default.

If the keyword SCROLL is specified, the cursor is scrollable, which means that variants of the
FETCH command can be used to position the cursor in very flexible ways; otherwise, only the basic
FETCH command, which retrieves the next row, is allowed

If the keyword INSENSITIVE is specified, the cursor behaves as if it is ranging over a private copy
of the collection of answer rows. Otherwise, and by default, other actions of some transaction could
modify these rows, creating unpredictable behavior.

A holdable cursor is specified using the WITH HOLD clause, and is not closed when the transaction
is committed.

Optional ORDER BY clause can be used to specify a sort order. The order-item-list is a list of order-
items. An order-item is a column name, optionally followed by one of the keywords ASC or DESC
Every column mentioned in the ORDER BY clause must also appear in the select-list of the query
associated with the cursor; otherwise it is not clear what columns we should sort on

ORDER BY minage ASC, rating DESC

The answer is sorted first in ascending order by minage, and if several rows have the same minage
value, these rows are sorted further in descending order by rating

https://vtucode.in page 49
Database Management System [18CS53]
[21CS53]

Rating minage

8 25.5

3 25.5

7 35.0
Dynamic SQL
Dynamic SQL Allow construction of SQL statements on-the-fly. Consider an application such as a
spreadsheet or a graphical front-end that needs to access data from a DBMS. Such an application
must accept commands from a user and, based on what the user needs, generate appropriate SQL
statements to retrieve the necessary data. In such situations, we may not be able to predict in
advance just what SQL statements need to be executed. SQL provides some facilities to deal with
such situations; these are referred to as Dynamic SQL.
Example:

char c_sqlstring[] = {"DELETE FROM Sailors WHERE rating>5"};


EXEC SQL PREPARE readytogo FROM :csqlstring;
EXEC SQL EXECUTE readytogo;
The first statement declares the C variable c_sqlstring and initializes its value to the string
representation of an SQL command
The second statement results in this string being parsed and compiled as an SQL command,
with the resulting executable bound to the SQL variable readytogo
The third statement executes the command

2.3 An Introduction to JDBC


Embedded SQL enables the integration of SQL with a general-purpose programming language. A
DBMS-specific preprocessor transforms the Embedded SQL statements into function calls in the
host language. The details of this translation vary across DBMSs, and therefore even though the
source code can be compiled to work with different DBMSs, the final executable works only with one
specific DBMS.
ODBC and JDBC, short for Open DataBase Connectivity and Java DataBase Connectivity, also
enable the integration of SQL with a general-purpose programming language.

In contrast to Embedded SQL, ODBC and JDBC allow a single executable to access
different DBMSs Without recompilation.

https://vtucode.in page 50
Database Management System [18CS53]
[21CS53]

There are two types of iterator classes:


named iterators
positional iterators
For named iterators, we specify both the variable type and the name of each column of the iterator.
This allows us to retrieve individual columns by name. This method is used in our example.
For positional iterators, we need to specify only the variable type for each column of the iterator. To
access the individual columns of the iterator, we use a FETCH ... INTO construct, similar to
Embedded SQL
We can make the iterator a positional iterator through the following statement:

#sql iterator Books (String, Float);

We then retrieve the individual rows from the iterator as follows:

while (true)

#sql { FETCH :books INTO :title, :price, };

if (books.endFetch())

{ break: }

/ / process the book

2.6 STORED PROCEDURES


Stored procedure is a set of logical group of SQL statements which are grouped to perform a
specific task.

Benefits :

reduces the amount of information transfer between client and database server

Compilation step is required only once when the stored procedure is created. Then after it
does not require recompilation before executing unless it is modified and reutilizes the same
execution plan whereas the SQL statements need to be compiled every time whenever it is
sent for execution even if we send the same SQL statement every time

It helps in re usability of the SQL code because it can be used by multiple users and by
multiple clients since we need to just call the stored procedure instead of writing the
same SQL statement every time. It helps in reducing the development time

https://vtucode.in page 64
Database Management System [18CS53]
[21cs53]

Syntax:

Create or replace procedure <procedure Name> [(arg1 datatype, arg2 datatype)]

Is/As

<declaration>

Begin

<SQL Statement>

Exception

-----

-----

End procedurename;

2.6.1 Creating a Simple Stored Procedure


Consider the following schema:

Student(usn:string,sname:string)

Let us now write a stored procedure to

create or replace procedure ss

is

stu_cnt int;

begin

select count(*) into stu_cnt from students where sname='AKSHAY';

dbms_output.put_line('the count of student is :' || stu_cnt);

end ss;

Stored procedures can also have parameters. These parameters have to be valid SQL types, and
have one of three different modes: IN, OUT, or INOUT.

IN parameters are arguments to the stored procedure

OUT parameters are returned from the stored procedure; it assigns values to all OUT
parameters that the user can process

INOUT parameters combine the properties of IN and OUT parameters: They contain values
to be passed to the stored procedures, and the stored procedure can set their values as
return values

https://vtucode.in page 65
Database Management System [18CS53]
[21CS53]

Example:

CREATE PROCEDURE Addlnventory (

IN book_isbn CHAR(lO),

IN addedQty INTEGER)

UPDATE Books SET qty_in_stock = qtyjn_stock + addedQty

WHERE bookjsbn = isbn

In Embedded SQL, the arguments to a stored procedure are usually variables in the host language.
For example, the stored procedure AddInventory would be called as follows:
EXEC SQL BEGIN DECLARE SECTION
char isbn[lO];
long qty;
EXEC SQL END DECLARE SECTION
/ / set isbn and qty to some values
EXEC SQL CALL AddInventory(:isbn,:qty);

Stored procedures enforce strict type conformance: If a parameter is of type INTEGER, it cannot be
called with an argument of type VARCHAR.

Procedures without parameters are called static procedures and with parameters are called
dynamic procedures.

Example: stored procedure with parameter

create or replace procedure emp(Essn int)

as

eName varchar(20);

begin

select fname into eName from employee where ssn=Essn and dno=5;

dbms_output.put_line(' the employee name is :'||Essn ||eName);

end emp;

2.6.2 Calling Stored Procedures

Stored procedures can be called in interactive SQL with the CALL statement:

CALL storedProcedureName(argl, arg2, .. ,argN);

https://vtucode.in page 66
Database Management System [18CS53]
[21CS53]

Calling Stored Procedures from JDBC


We can call stored procedures from JDBC using the CallableStatment class.A stored procedure
could contain multiple SQL statements or a series of SQL statements-thus, the result could be many
different ResultSet objects.We illustrate the case when the stored procedure result is a single
ResultSet.
CallableStatement cstmt= con. prepareCall(" {call ShowNumberOfOrders}");
ResultSet rs = cstmt.executeQuery();
while (rs.next())
Calling Stored Procedures from SQLJ
The stored procedure 'ShowNumberOfOrders' is called as follows using SQLJ:

/ / create the cursor class

#sql Iterator CustomerInfo(int cid, String cname, int count);

/ / create the cursor

CustomerInfo customerinfo;

/ / call the stored procedure

#sql customerinfo = {CALL ShowNumberOfOrders};

while (customerinfo.next()

System.out.println(customerinfo.cid() + "," +

customerinfo.count()) ;

2.6.3 SQL/PSM
SQL/Persistent Stored Modules is an ISO standard mainly defining an extension of SQL with
procedural language for use in stored procedures.

In SQL/PSM, we declare a stored procedure as follows:

CREATE PROCEDURE name (parameter1,... , parameterN)

local variable declarations

procedure code;

https://vtucode.in page 67
Database Management System [18CS53]
[21CS53]

We can declare a function similarly as follows:

CREATE FUNCTION name (parameterl, ... , parameterN)

RETURNS sqIDataType

local variable declarations

function code;

Example:

CREATE FUNCTION RateCustomer (IN custId INTEGER, IN year INTEGER)

RETURNS INTEGER

DECLARE rating INTEGER;

DECLARE numOrders INTEGER;

SET numOrders = (SELECT COUNT(*) FROM Orders 0 WHERE O.tid = custId);

IF (numOrders> 10) THEN rating=2;

ELSEIF (numOrders>5) THEN rating=1;

ELSE rating=O;

END IF;

RETURN rating;

We can declare local variables using the DECLARE statement. In our example, we declare two
local variables: 'rating', and 'numOrders'.

PSM/SQL functions return values via the RETURN statement. In our example, we return the
value of the local variable 'rating'.

We can assign values to variables with the SET statement. In our example, we assigned the
return value of a query to the variable 'numOrders'.

SQL/PSM has branches and loops. Branches have the following form:

IF (condition) THEN statements;

ELSEIF statements;

ELSEIF statements;

ELSE statements;

END IF

Loops are of the form

https://vtucode.in page 68
Database Management System [18CS53]
[21CS53]

LOOP

statements:

END LOOP

Queries can be used as part of expressions in branches; queries that return a single value can be
assigned to variables.We can use the same cursor statements as in Embedded SQL (OPEN,
FETCH, CLOSE), but we do not need the EXEC SQL constructs, and variables do not have to be
prefixed by a colon ':'.

https://vtucode.in page 69

You might also like