Module-III
Chapter-3
Lecture-1
Database Design
Learning Objective:
3.1 Introduction
3.2 Design Guidelines for Relation Schemas
3.1 Introduction:
Database design may be performed using two approaches: Bottom-up or Top-down.
A Bottom-up design methodology (also called design by synthesis) considers the basic relationships
among individual attributes as the starting point and uses those to construct relation schemas. This
approach is not very popular in practice because it suffers from the problem of having to collect a large
number of binary relationships among attributes as the starting point. For practical situations, it is next to
impossible to capture binary relationships among all such pairs of attributes.
A Top-down design methodology (also called design by analysis) starts with a number of groupings of
attributes into relations that exist together naturally, for example, on an invoice, a form, or a report. The
relations are then analyzed individually and collectively, leading to further decomposition until all
desirable properties are met.
Relational database design ultimately produces a set of relations. The implicit goals of the design activity
are information preservation and minimum redundancy.
3.2 Design Guidelines for Relation Schemas
Guideline 1: Design a relation schema so that it is easy to explain its meaning. Do not combine attributes
from multiple entity types and relationship types into a single relation. Intuitively, if a relation
schema corresponds to one entity type or one relationship type, it is straightforward to
interpret and to explain its meaning. Otherwise, if the relation corresponds to a mixture of
multiple entities and relationships, semantic ambiguities will result and the relation cannot be
easily explained.
Examples of Violating Guideline 1. The relation schemas in Figures 15.3(a) and 15.3(b) also have clear
semantics. A tuple in the EMP_DEPT relation schema in Figure 15.3(a) represents a single employee but
includes additional information—namely, the name (Dname) of the department for which the employee
works and the Social Security number (Dmgr_ssn) of the department manager. For the EMP_PROJ
relation in Figure 15.3(b), each tuple relates an employee to a project but also include the employee
name (Ename), project name (Pname), and project location (Plocation).
Figure 3.1: Above two relation schemas suffering from anomalies.
The above relation leads to an additional problem referred to as update anomalies. These can be
classified into
1. Insertion anomalies,
2. Deletion anomalies
3. Modification anomalies
1. Insertion Anomalies:
Insertion anomalies can be differentiated into two types, illustrated by the following examples based
on the EMP_DEPT relation:
To insert a new employee tuple into EMP_DEPT, we must include either the attribute values for
the department that the employee works for, or NULLs (if the employee does not work for a
department as yet).
For example, to insert a new tuple for an employee who works in department number 5, we must
enter all the attribute values of department 5 correctly so that they are consistent with the
corresponding values for department 5 in other tuples in EMP_DEPT.
It is difficult to insert a new department that has no employees as yet in the EMP_DEPT relation.
The only way to do this is to place NULL values in the attributes for employee. This violates the
entity integrity for EMP_DEPT because Ssn is its primary key.
2. Deletion Anomalies:
The problem of deletion anomalies is related to the second insertion anomaly situation just
discussed. If we delete from EMP_DEPT an employee tuple that happens to represent the last
employee working for a particular department, the information concerning that department is lost
from the database.
3. Modification Anomalies:
In EMP_DEPT, if we change the value of one of the attributes of a particular department say, the
manager of department 5 we must update the tuples of all employees who work in that
department; otherwise, the database will become inconsistent. If we fail to update some tuples,
the same department will be shown to have two different values for manager in different
employee tuples, which would be wrong.
Guideline 2: Design the base relation schemas so that no insertion, deletion, or modification anomalies are
present in the relations. If any anomalies are present, note them clearly and make sure that the
programs that update the database will operate correctly.
Guideline 3: As far as possible, avoid placing attributes in a base relation whose values may frequently
be NULL. If NULLs are unavoidable, make sure that they apply in exceptional cases only and
do not apply to a majority of tuples in the relation.
NULLs can have multiple interpretations, such as the following:
The attribute does not apply to this tuple. For example, Visa_status may not apply to U.S.
students.
The attribute value for this tuple is unknown. For example, the Date_of_birth may be
unknown for an employee.
The value is known but absent; that is, it has not been recorded yet. For example, the
Home_Phone_Number for an employee may exist, but may not be available and recorded
yet.
Using space efficiently and avoiding joins with NULL values are the two overriding criteria
that determine whether to include the columns that may have NULLs in a relation or to have a
separate relation for those columns.
Guideline 4: Design relation schemas so that they can be joined with equality conditions on attributes that
are appropriately related (primary key, foreign key) pairs in a way that guarantees that no
spurious tuples are generated. Avoid relations that contain matching attributes that are not
(foreign key, primary key) combinations because joining on such attributes may produce
spurious tuples.