Database Design and Management Part 1,2,3,4,5
Database Design and Management Part 1,2,3,4,5
CHAPTER 1
BASICS OF DATABASE SYSTEMS
1.1. INTRODUCTION
In our everyday life in the modern society, we come across databases and database systems quite
frequently. Several activities on a day-to-day basis require the interaction with one or another
database. For example, bank transactions like deposit or withdrawal of money, railway or airline
reservation, borrowing or returning a book from a computerized library, purchasing items from a
departmental store or through Internet, computerized hospital management etc. all involve
accessing and interacting with databases. Databases play an important role in almost all areas
where computers are used, including business, electronic commerce, engineering, medicine, law,
education and library science, to name a few.
Traditional Database applications
In the traditional database applications, most of the information stored and accessed is either
textual or numeric. Some traditional database applications are:
Banking
Inventory
Airline reservation
Library management
Hospital management
Advanced Database applications
Some of the advanced applications of database systems are
Multimedia databases-that can store pictures, video clips and sound messages.
Geographic information systems (GIS)- that can store and analyze maps, weather data
and satellite images
Data warehouses and Online analytical processing (OLAP)-that are used in many
companies to extract and analyze useful information from very large databases for
decision-making.
Real-time and Active database technology- that is used in controlling industrial and
manufacturing processes
Mobile Databases- They are available on the users portable computers. Users interact
with the mobile database application, which in turn accesses the data stored in the mobile
databases through the DBMS.
Web databases- databases are integrated with web to support business operations like ecommerce, supply chain management or web publishing
1.2. BASICS OF DATABASE TECHNOLOGY
DATA
Data is a collection of known facts, figures or statistics that can be recorded and that is
meaningful. Data can be text, numbers, audio, video or any combination of these. Data in its raw
form may not be useful for decision-making. It must be processed so as to become useful.
DATABASE
It is a collection of related data, having valid and implicit meaning and is designed to meet the
information needs of an organization.
Example 1: A personnel Telephone directory, which contains name, address and telephone
numbers, arranged in some particular order, say sorted according to name, acts as a collection of
related data having valid and implicit meaning and hence it is a telephone database.
Example 2: A college staff register, containing the data regarding name, date of joining, basic
salary, net salary etc. For each staff member, where arranged in some order, becomes a Staff
database.
Example 3: There can be a student database containing the details of the students like register
number, name, date of birth, fathers name, mothers name, marks details etc.
IMPLICIT PROPERTIES OF DATABASE
1. A database represents some aspect of the real world. A database has some source from
which data is derived. There will be some degree of interaction with events in the real
world. There will be multiple applications and multiple users who are interested in the
contents of the database.
2. A database is a logically coherent collection of data with some inherent meaning.
3. A database is designed, built and populated with data for a specific purpose.
4. A database can be of any size and any level of complexity. If huge amount of data is
involved, the data must be organized and managed, so that users can search for, retrieve
and update the data whenever needed.
5. A database may be generated and maintained manually or by a group of application
programs written specifically for that task or by a database management system.
DATABASE MANAGEMENT SYSTEM
A database management system (DBMS) is a collection of programs that enables users to create
and maintain a database. This is a software system that allows access of the data contained in the
database. The primary goal of a DBMS is to provide an environment that is both convenient and
efficient to use in storing and retrieving database information.
FUNCTIONS OF DBMS
DBMS is a general purpose software system that performs the following functions.
i) Defining a database
ii) Constructing the database
iii) Manipulating the database
iv) Sharing databases among various users
v) Protecting the database
vi) Maintenance of a database
(i) Defining a database
This involves specifying the data types, structures and constraints for the data to be stored in
the database.
(ii) Constructing the database
This is the process of storing the data on some storage medium that is controlled by DBMS.
DATABASE SYSTEM
Application Programs/Queries
DBMS SOFTWARE
Stored Database
Stored
Definition
Database
Figure 1.1 Typical Database System Environments
The application programs or queries that are written by the user or programmer are processed by
the DBMS software so as to perform the required function. The DBMS software can also
perform functions like storing the data in the specified structure, manipulating with the database
performing operations like updation, retrieval etc. Whenever request to access data is made, a
part of the DBMS software first refers to the metadata to access the structure and hence
determine the size and position so as to access data from the stored database.
1.3. TRADITIONAL FILE ORIENTED APPROACH
Earlier the information was maintained in permanent system files. To allow users to manipulate
the information, the system has a number of application programs that can manipulate with files.
The application programs are written by system programmers in response to the needs of the
organization. New programs are added to the system as the need arises. As a result, new
permanent files are created. The file processing system is supported by a conventional operating
system. Permanent records are stored in various files and different application programs are
written to extract records from and to add records to the appropriate files.
Disadvantages of Traditional File Approach
This approach has a number of major disadvantages:
1. Data redundancy and inconsistency
2. Difficulty in accessing data due to data isolation
3. Integrity problems
4. Concurrent access anomalies
5. Security problems
1. Data redundancy and inconsistency
In a file-based system, same information may be duplicated in several files. Redundancy leads to
higher storage and access cost. In addition, it may lead to data inconsistency. Data inconsistency
means various copies of the same data may not be the same.
Example: An educational institute maintains the details of its students. The registration officer
maintains the names of the students, date of joining and course to which they have registered.
The accounting officer maintains students name, course to which he is admitted, date of joining
and fee details. Though most of information required by both the officers is the same, some
details are not the same and separate files have to be maintained. Also some common details are
stored in duplicates.
This existence of the same data in multiple places or the replication of the same data or duplicate
data existence is known as Data redundancy.
This redundancy in storing the data in multiple files has some disadvantages:
In a file system, data are scattered in various files and the files may be in different formats.
Therefore, it is difficult to write new application programs to retrieve appropriate data in a
convenient and efficient manner.
3.
Integrity problems
The data values stored in the database must satisfy certain types of consistency constraints. The
constraints, for example, may be that, the type of the data field must be specified for each data.
Consider, the grade field can have only one character entry from A to C in the university
database. Another type of constraint applies to the uniqueness of data. For example, Student
registration number of every student must be unique. Employee code for each employee in an
organization must be unique. Developers enforce these constraints in the system by adding
appropriate code in the various application programs. However, when new constraints are added,
it is difficult to change the programs to enforce them. The problem is compounded when
constraints involve several data items from different files.
4.
Concurrent-access anomalies
To improve the overall performance of the system and for a faster response time, many systems
allow multiple users to update the data simultaneously. In such an environment, interaction of
concurrent updates may result in inconsistent data. Because many different application programs
that have not been coordinated previously may access data, it is difficult to provide supervision.
5.
Security problems
Not every user of the database system should be able to access all the data. In order to maintain
security of data, access to the data must be controlled. For example, in a banking system, payroll
personnel should be able to view information about bank employees only. They need not access
information about customer accounts.
1.4. CHARACTERISTICS OF A DATABASE APPROACH
A number of characteristics distinguish the database approach from the traditional file approach.
In the database approach, there is a single repository where data are maintained. This database is
defined once and is then accessed by various users.
The main characteristics of the database approach are:
i) Self describing nature of a database system
ii) Program data independence and data abstraction
iii) Support of multiple views of data
iv) Sharing of data and multi-user transaction processing
(i) Self describing nature of the database
In a database approach, the database system contains not only the database but also the complete
definition of the database. The definition includes structure of the file, the type and storage
format of each data item and is stored in the system catalog. The information stored in the
catalog is referred to as metadata and it describes the structure of the primary database.
DBMS software refers to the catalog to know the structure of any database. The DBMS software
can access various databases by extracting the database definition from the catalog and corelating the database with the definition. The definitions of all the files are stored in the catalog.
When a request is made to access the name field of a student record, the DBMS software refers
to the catalog to first find the structure of the student file and then the size and the location of the
name field within the student record.
(ii) Program Data Independence and Data Abstraction
In a DBMS environment, if a new field is to be added to the student file, only the student file
definition in the catalog has to be changed and no change is needed in the programs that access
the database. This is known as Program Data Independence. The next time a DBMS program
refers to the catalog, the new structure of student record will be accessed and used
Many other details of file storage organization such as access paths on a file can be hidden from
database users by the DBMS. Database users and application programs refer to the conceptual
representation of files and DBMS extracts the details of file storage from the catalog whenever
required. This provides what is called Data abstraction to the database users.
The properly that allows program data independence and program operation independence is
called Data Abstraction.
Therefore, DBMS provides the user with only the conceptual representation of the data, which
does not include many of details of how the data is stored or how the operations are
implemented.
(iii) Support Of Multiple Views Of Data
A database will usually have multiple users, where different users may need different views,
different parts of the database. A view, i.e. the required part may be directly available in the
database (stored) or it may have to be derived from the existing data.
(iv) Sharing Of Data and Multi-User Transaction Processing
In a database approach, a single repository of data is maintained by integrating data from
multiple applications. Different users can share the data from the same database with a multi user
DBMS. It is required that multiple users access the same data at a time. For this, DBMS must
have a concurrency control to ensure that several users trying to update the same data do so in a
controlled manner so that the result of the updation process is correct. For example, in any flight
reservation system, the reservation clerks should not try to allot the same seat to different
persons. DBMS should take care to see that only one clerk would have the access to the seat at a
time. These are called On Line Transaction Processing (OLTP) applications. Example refer to
RwandaAir booking System.
The person controlling and coordinating the DBMS system is called the Database Administrator
(DBA).
The responsibilities of the DBA can be listed as follows:
1.
The DBA administers and sets up the definition of the global view or conceptual level of
the database. The DBA further specifies the external view of the various users and
applications. He is responsible for the definition and implementation of the internal level,
including the storage structure and access methods to be used for the optimum performance
of the DBMS.
2.
The DBA should ensure that appropriate measures are in place to maintain the integrity
and/or reliability of the database and that the database is not accessible to unauthorized
users. The DBA is responsible for granting permission to the users of the database and
stores the profile of each user in the database. This profile describes the permissible
activities of a user on that portion of the database accessible to the user through one or more
user views. The user profile can be used by the database system to verify so that a particular
user performs a given operation on the database.
3.
The DBA is also responsible for defining procedures to recover the database from failures
due to human, natural, or hardware causes with minimal loss of data. This recovery
procedure should enable the organization to continue to function and the intact portion of
the database should continue to be available.
End users are the people who access the database for querying (retrieving data for a specific
requirement), updating and generating reports. There are different categories of end users. They
are:
i) Casual end users
ii) Nave or parametric end users
iii) Sophisticated end users
iv) Stand alone users
i)
They are the end users who occasionally access the database. Each time, they may require
different information from the database. They use sophisticated database query language to
specify their requests. They need to know only few facilities, which they are concerned with and
may be used repeatedly.
ii) Nave or parametric end users
Most number of database end users is the nave or parametric end users. They constantly make
use of the database, querying and updating the database. They utilize standard types of
queries/commands and updates called canned transactions, which are carefully programmed
and tested. Nave users need not know much about the facilities provided by the DBMS. They
only have to understand the user interface of the standard transactions designed and implemented
for their use.
For example, Bank tellers do the functions of checking account balances and posting the
transactions like withdrawals and deposits.
Reservation clerks for Airlines, Railways and hotels check for the availability of the facilities
may be seats or rooms and then accordingly make reservations.
iii) Sophisticated end users
They include engineers, scientists, business analyst, who thoroughly familiarize with the
facilities of DBMS to meet their requirements. They learn most of the DBMS facilities in order
to achieve their complex requirements.
In the database approach, to avoid this redundancy, we have a database which stores each data
item such as students name or birth date in only one place. This avoids inconsistency by saving
effort, storage area and by providing efficient updation.
1.6.2. Restricting Unauthorized Access
In order to maintain the security of the data, access to the database has to be controlled in an
environment where multiple users share a large database. Usually, financial data is supposed to
be confidential and for such databases users must be authorized to access it. Retrieving data and
updating the data are two different types of access. Some users may be only authorized to
retrieve the data and not to update it. Some may be authorized to do both. Typically, users or user
groups will be given some account numbers protected by passwords, through which the database
can be accessed.
The same strategy can be applied to DBMS software, where nave users are permitted only to
retrieve the database by canned transactions.
1.6.3. Providing Storage Structures For Efficient Query Processing
Database systems must provide capabilities for efficiently execution queries and updates.
Auxiliary files called indexes are used by DBMS to speed up disk search for the desired records.
Also buffering module of DBMS maintains parts of the database in main memory buffers. The
query processing and optimization module of the DBMS is responsible for choosing an efficient
query execution plan for each query based on the existing storage structures.
1.6.4. Providing Backups and Recovery
In a DBMS, always there should be a facility to recover from hardware and software failures.
This facility is called the Backup and recovery subsystem of the DDBMS. For example, if the
system fails when an updating program is under execution, the recovery system should see that
the database is restored back to its initial state i.e., the stage before starting the execution.
Alternatively, the recovery subsystem could ensure that, the program is resumed from the point
at which it was interrupted. Also, the system should provide automatic backup facility in order to
avoid power failure problems.
Sometimes, incorrect data may also satisfy the constraints. Ex. The grade B may be wrongly
entered, as C, but still it is acceptable to the constraints and this type are unavoidable. Manually
tracing the error and then correcting the database can only avoid these problems.
1.7. WHEN DBMS SHOULD NOT BE USED
DBMS works out costlier under following circumstances:
High initial investment in hardware, software and training.
Generality that a DBMS provides for defining and processing data.
Overhead for providing security, concurrency control, recovery and integrity functions.
The database and its applications are simple, well defined and not expected to change.
There are stringent real time requirements for some programs that may not be met
because of DBMS overhead.
Multiple user access to data is not required.
CHAPTER 2
DATABASE SYSTEM MODELS AND ARCHITECTURES
2.1. INTRODUCTION
Earlier DBMS software package was one integrated system where large centralized mainframe
computers were used. The modern DBMS packages are modular in design, with a client/server
architecture where hundreds of distributed workstations and personal computers are connected
via communications network. There are various types of servers like Web servers, database
servers, file servers, application servers and so on. A client module will run on a user workstation
or personal computer. Typically, application programs and user interfaces that access the
database run in the client module. The client module handles user interaction and provides the
user-friendly interfaces such as forms or menu based GUIs. The server module handles data
storage, access, search and other functions. The study of data models, schemas and instances is
very important to understand database systems.
2.2. DATA MODELS, SCHEMAS, AND INSTANCES
2.2.1 Data Model
A data model is a collection of concepts that can be used to describe the structure of the database.
It is a type of data abstraction that is used to provide conceptual representation of data. It uses
logical concepts that can be easily understood.
A data model is a set of concepts that can be used to describe the structure of a database, which
includes data types, relationships and data constraints etc. It also includes a set of basic
operations for specifying retrievals and updates on the database. Also dynamic aspect or behavior
of a database application is included in a data model.
Categories of Data Models
Data models are categorized based on the types of concepts that they provide to describe the
database structure.
i) High-level or conceptual data model
These data models represent data by using record structure and hence they are also called as
record-based data models.
2.2.2 Schemas And Instances, Database State
The database description is called as Database Schema (metadata). The database schema is
defined during the database design only and it is not expected to change frequently. Many of the
data models provide the diagrammatic representation of the schema, known as the Schema
diagram. Here, only the database structure is displayed without any actual data. Each object in
the schema is called a Schema construct, such as student, course, section, etc.
STUDENT
NAME
STUDNO
MOBILE
MARKS
COURSENO
CLASS
MARKS
COURSENO
SEMESTER
YEAR
SECTIONID
GRADE
COURSE
COURSENAME
SECTION
SECTIONID
GRADE_REPORT
SECTIONID
PRE_REQUISITE
COURSENO
PRE_REQUISITENO
1. The Internal Level: This consists of an internal schema, which describes the physical
storage structure of the database. It is the lowest level of abstraction. It does not hide the
storage details. It contains the definition of the stored record, the method of representing
the data fields and the access aids used. This internal schema uses a physical data model
and describes the complete details of data storage and access paths for the database. It is
also called as the Physical Schema.
2. The Conceptual Level: This has a conceptual schema, which describes the structures of a
database for a group of users. This schema hides the storage details from the user and it
includes description of entities, data types, relationships, user operations and constraints.
The description of data at this level is in a format independent of its physical
representation.
3. The External Level: This has a number of external schemas or user views. Each external
schema describes the part of the database that a particular user group is interested in and
hides all the other details from this group (all other databases). This is at a highest level
of database absorption.
End Users
External
External view 1
Level
External/Conceptual
Mapping
..
External view N
Conceptual Level
Conceptual schema
Conceptual/Internal
Mapping
Internal Level
Internal schema
Stored
Database
External/Conceptual Mapping
Conceptual/Internal Mapping
External/Conceptual Mapping
The mapping between external and conceptual level, gives the correspondence between records
and relationships of external and conceptual levels.
Conceptual/Internal Mapping
The mapping between conceptual and internal levels gives the correspondence between the
structure (metadata) and the size and position of the data in the stored database.
So the three level schema architecture is a convenient tool with which the user can visualize the
schema levels in a database system.
2.4 DATA BASE SYSTEM ARCHITECTURES
2.4.1 CENTRALIZED ARCHITECTURE
With the use of personal computers and workstations, the database systems were centralized in
which all the DBMS functionality, application program execution and user interface processing
were carried out on one machine. Figure 2.4 shows the physical components in a centralized
architecture.
In a centralized database system, always the data and the DBMS software reside only at one
place or site. Other sites, which are connected to this centralized site can access the database. In a
centralized system if the central system fails, then the entire system will be halted.
Display
Display
Monitor
Monitor
Display
.
Monitor
Network
Application
Programs
Terminal Display
Text
Editor
control
..
DBMS
Compilers
Controller
Controller
Controller
CPU
Memory
Disk
I/O devices
(printers, tape
drives..)
4. Improved performance
Naturally distributed data
In some cases data is naturally distributed over different areas. For example, a company may
have its head quarters in one place and its branches may be situated in some other places. In such
cases, the different sites maintain only their local data, coming under their limitations. This is
just like a small mini world, having all the features of the database, but only local data. Only
the global users, such as the head quarters will refer to these local sites, whenever need arises.
Reliability and Availability
In a distributed system, because the data is distributed over different sites, even if one site fails,
the system will continue working with other sites. Only the data under the failed site will not be
accessible. Thus Distributed DBMS are more reliable, compared to centralized systems. Also,
since each site handles only limited amount of local data, the system is relatively less busy. This
increases the probability of the system being available continuously at any given time interval.
Duplicating the data and the software at different sites also improves the performance.
The Distributed DBMS software must have some additional functions such as:
Ability to access remote sites through the network
Keeping track of data distribution and replication in the DBMS catalog
Defining strategies for queries, which share the data from different sites.
Ability to maintain the consistency of replicated data at different sites.
In a DDBMS, the data, which logically belongs to a single database, is physically distributed
over different sites. The sites may be located close to each other, say in adjacent buildings,
connected by Local Area Network (LAN) or they may be located geographically apart,
connected by long haul network (connected through telephone line or satellites). Physically, it is
achieved by a model called Client-server architecture.
2.4.3 CLIENT-SERVER ARCHITECTURE
The client/server architecture was developed to deal with computing environments in which a
large number of PCs, workstations, file servers, printers, database servers, web servers and other
requirement are connected via a network.
In client server architecture, number of personal computers, workstations, servers and printers
are involved. Each server is assigned a specific functionality. For example, software server stores
the software required by all its clients, whereas the client machine can only store the programs
and for executing these programs, they should get connected to the server. Similarly we have file
server that maintains the files of the client machines. We have the printer server which is
connected to various printers and all print requests by the clients are forwarded to this machine.
The resources provided by specialized servers can be accessed by many client machines. The
client machines provide the user with the appropriate interfaces to utilize these servers, as well as
with local processing power to run local applications.
There are two client/server frameworks.
Two-tier
Three-tier
2.4.3.1 TWO-TIER CLIENT/SERVER BASED ARCHITECTURE
In two tier architecture, the software components are distributed over two systems: the client and
the server.
Figure 2.5 shows client/server architecture at the logical level.
Client
Client
Client
File
DBMS
Server
Server
Server
..
SERVER
CLIENT
Site 1
CLIENT
Site 2
SERVER
Site 3
omm
Communication Network
CLIENT
Site n
Application Programs
Application Server
Web pages
or web Server
Database Management
Database Server
System
The web server plays the intermediary role by storing business rules (procedures or constraints)
that are used to access data from the database server. It checks the clients credentials before
forwarding a request to the database server. The intermediate server accepts requests from the
client, processes the request and sends database commands to the database server, and then acts
as a conduit for passing processed data from the database server to the clients, where it may be
processed further and filtered to be presented to users in GUI format. The user interface,
application rules and data access acts as the three tiers.
2.5. Type and Use of the Database Interface
Following, two important uses of a database interface like SQL are listed:
Interactive
Embedded
5. Object-relational
Relational Model
The principles of the relational model were first outlined by Dr. E.F. Codd. The Relational Data
model represents a database as a collection of tables, where each table can be stored as a separate
file. Most relational databases have high-level query languages and support a limited form of
user views. RDBMS relational database management system is based on the relational model
developed by E.F. Codd. A relational database allows the definition of data structures, storage
and retrieval operations and integrity constraints. In such a database, the data and relations
between them are organized in tables. A table is a collection of records and each record in a table
contains the same fields.
Term
Definition
Example
Entity
Attribute
A property of an entity
Relationship
Name
Designation
Qualification
Department
Salary
E001
Vani
Manager
PG
R&D
25000
E002
Preetha
Ass. Manager
PG
Accounts
20000
E003
Aditi
Supervisor
PG
Production
20000
E004
Vidya
Assistant
Diploma
Production
5000
The entity is Employee. The attributes are Emp_code, Name, Designation, Qualification,
Department and Salary. Vidya is a data item for the attribute Name. PG is a data item for the
field Qualification.
Properties of Relational Tables
Values are atomic
Each row is unique
Column values are of the same kind
The data
model is a simple network, and link and intersection record types (called junction records by
IDMS) may exist, as well as sets between them. Usually, a set defines a 1:M relationship,
although 1:1 is permitted.
STUDENT
COURSE
SECTION
GRADE-REPORT
PRE-REQUISITE
Bank-Author
Book
Author
Book-
Branch
Branch
Figure 2.5 Network database model
A network database consists of two data sets
A set records
A set links
The record types are made up of fields in the usual way. Links may be followed in their
direction.
Characteristics of Network Model
1. Data is organized into logical units called record types
2. Record types can have more than one owner (each data grouping can belong to more than
one category)
3. There are many efficient entry points into the database
Hierarchical Model
In a hierarchical data model, data is organized into a tree-like structure. Each hierarchy
represents a number of related records. There is a hierarchy of parent and child data segments.
This structure implies that a record can have repeating information, generally in the child data
segments. Data in a series of records, which have a set of field values attached to it.
It collects all the instances of a specific record together as a record type. These record types are
the equivalent of tables in the relational model, and with the individual records being the
equivalent of rows. To create links between these record types, the hierarchical model uses
Parent Child Relationships. These are a 1:N mapping between record types.
For example, an organization might store information about an employee, such as name,
employee number, department, salary. The organization might also store information about an
employees children, such as name and data of birth. The employee and children data forms a
hierarchy, where the employee data represents the parent segment and the children data
represents the child segment. If an employee has three children, then there would be three child
segments associated with one employee segment. In a hierarchical database the parent-child
relationship is one to many. This restricts a child segment to having only one parent segment.
The first database systems developed were hierarchical. Hierarchical DBMSs were popular from
the late 1960s, with the introduction of IBMs Information Management System (IMS) DBMS,
through the 1970s.
The language used for hierarchical model is usually a record-at-a-time language.
Characteristics of Hierarchical Data model
1. Data is grouped into logical units called segment types.
2. Segment types are linked in a parent/child hierarchical where each child segment type has
only one parent segment type.
3. There is only one efficient entry point.
4. Database use physical pointers to link records.
Advantages
1. The structure of a hierarchical database is easy to understand. The hierarchy is based on
logical relationships between parent and child segment types.
2. They provide very efficient high speed retrieval.
Disadvantages
1. There are difficult to modify as an organizations data needs change.
The Figure 2.6 represents a hierarchical database at a typical school where each department
employs many faculties and each faculty advises many students.
Department
Faculty
Faculty
Student
Student
Hierarchical
Network
Data Grouping
segment types
record types
Relationship Between
Grouping Types
(owner)
Data Access
Number of Parents
Distributed
1) Database is maintained at a number of
different sites.
2) If one system fails system continues work
with other sites.
3) More reliable
2.
3.
What are types of representational models? Name any three popular models.
4.
5.
6.
7.
8.
What is metadata?
9.
CHAPTER 3
4. Physical design
Also each employee may have one or more dependents. These details are to be maintained for
insurance purposes. The dependents name, sex, birth date and relationship to the employee are
maintained.
The Company Database will have the following relations with their respective attributes.
Department: DeptNo, DeptName, Dlocation, Manager
Project: ProjNo, ProjNName, Plocation
Emmployee: EmpId, EmpName, Address, Sex, Salary, DOB
Dependent: DepName, gender, DOB, Relnship
INITIAL CONCEPTUAL DESIGN OF THE COMPANY DATABASE:
Now we can identify four entity types for the above company database, namely department,
project, employee and dependent with the following features:
1. Entity type: DEPARTMENT
Attributes: DName, DNumber, DLocations, DManager
Key attributes: DName and DNumber
Multivalued attribute: DLocations
2. Entity type: PROJECT
Attributes: PName, PNumber, PLocation and Pmanager
Key attribute: PName, PNumber
3. Entity type: EMPLOYEE
Attributes: eName, eSN, eSex, eAddress, eSalary,eBirthdate, Department and supervisor.
Address may be a composite attribute.
4. Entity type: DEPENDENT
Attributes: Empployee, dpid, DependentName, Sex, Birthdate and Relationship to the
employee.
Since each employee may work on different projects, number of hours worked on each project
may be represented by a multivalued composite attribute of employee called WorksOn, with
simple components, Project and Hours (Project, Hours)}.
STUDENT
Attributes: Name
Age
Value:
Rogers
20
Address
12, Kicukiro
COMPANY
Name
MTN
H.Q.
President
Kagali
Rogers
Attributes that cannot be further subdivided are called simple or atomic attributes.
Consider the address attribute of student entity.
For example Regno, Deptno, CourseID are three attributes which cannot be subdivided for
further divided and are called simple attributes or Atomic attributes.
(ii)
Composite attribute
A composite attribute can be divided into smaller subparts, which represent more basic attributes
with independent meanings. Composite attributes are used, only if there is a need to refer to the
attribute as a whole, as well as part by part.
For example the Address attribute can be sub-divided into House-number, Street_Name, Area,
City, and Country.
Another example would be the Name attribute which can be composed of FirstName,
MiddleName and LastName.
(iii) Single valued attribute
An attribute that can take only one value at a time is called Single valued attribute. Usually, for a
particular entity, each attribute will have a single value.
For example the Age attribute will have a single value.
Stored attribute
The value of certain attributes cannot be obtained or derived from some other attributes. They
cannot be derived. Such attributes are said to be stored attributes.
Two or more attributes may sometimes be related. For example, age and birthdate are two
attributes that are related. If birthdate is known, with the help of current todays date, age can be
determined.
Birthdate is then said to be stored attribute.
(vi) Derived attribute
If the value of an attribute can be derived from some other attributes, then such attributes are
called derived attributes. Since age can be determined by knowing birthdate and current date, the
age attribute is said to be derived attribute.
A particular attribute is not applicable at all for the entity. Ex. For undergraduate
students, the attribute degree is not applicable at all. In that case, a null value may be
assigned to that particular attribute.
2.
The value of the attribute is not known, but it is applicable. Ex. If the age of a particular
student is not known, then also a null value can be assigned to it.
For example PhoneNo attribute may contain null value in an employee entity set as some
employees may not have telephone number.
Also email attribute may contain null value in an employee entity set as some of the employees
may not have an email id.
3.4. RELATIONSHIP
3.4.1. Relationship Definition
A relationship is an association among two or more entities. A relationship captures how two or
more entities are related to one another. Relationships can be thought of as verbs, kinking two or
more nouns.
Examples:
(i) An owns relationship between a company and a computer
(ii) a supervises relationship between an employee and a department
(iii)a performs relationship between an artist and a song
(iv) a proved relationship between a mathematician and a theorem
Relationships are represented as diamonds, connected by lines to each of the entities in the
relationship in an ER diagram.
3.4.2. Relationship Types, Sets and Instances
Relationship Types
A relationship type R among n entity types E1, E2,..defines a set of associations among
entities from these types.
Example: Relationship type work-for between two entities of company database, namely
EMPLOYEE and DEPARTMENT.
3.4.3. Relationship between Entity
The relationship between entity sets are of four types. They are:
(i) One-to-many (1:M),
(ii) Many-to-one (M:1),
Many-to-many (M:N),
(iii)One-to-one (1:1).
One-to-one relationship (1:1 relationship)
The 1:1 relationship between entity sets E1 and E2 indicates that for each entity in either set
there is at most one entity in the second set that is associated with it.
Example: Relationship between Department and Manager.
Department
Manager
Figure 3.5 One-to-one relationship
The relationship between a Department and a Manager is usually one-to-one; there is only one
manager per department and a manager manages only one department. There could be times
when a department is without a manager or when an employee who is classified as a manager
may be without a department to manage. It does not imply that for an occurrence of an entity
from one set at any time there must be an occurrence of an entity in the other set.
One-to-many relationship (1:M relationship)
The 1:M relationship from entity set E1 to E2 indicates that for an occurrence of the entity from
the set E1, there could be zero, one, or more entities from the entity set E2 associated with it.
Each entity in E2 is associated with at most one entity in the entity set E1.
Example: Relationship between Manager and Employee.
Manager
Employee
A 1:M relationship exists from the entity Manager to the entity Employee because there are
several employees reporting to the manager. There could be an occurrence of the entity type
Manager having zero occurrences of the entity type Employee reporting to him or her.
Many to One Relationship (M:1)
Employee
Manager
Employee
Project
Figure 3.8 Many to Many relationship
Each employee could be involved in a number of different projects, and a number of employees
could be working on a given project.
TEACHER
JB
PRINCE
OLIVIER
JOHN
TEACHES
r1
r2
r3
r4
r5
r6
CLASS
I BBIT
II BBIT
III BBIT
Associative
Fundamental
Entity
Attribute
ENTITY
Weak
entity
Entity types that do not have key attributes of their own are called weak entity types.
Attribute
Key
Derived
attribute
A derived attribute is based on another attribute. For example, age is a derived attribute that can
be derived from dateofbirth and current date.
Relationsh
Relationship
0 or 1
1 or more
0,1, or more
many
In the following example each student fills one seat in a class. Each seat is filled by one student.
(In this usage a seat implies not only a physical place to sit but also a specific day and time).
This is one-to-one relationship.
Student
Seat
Fills teach several courses. Each course has only one
In the next example a single instructor may
instructor.
This is a one-to-many relationship.
Instructor
Teache
Course
As shown below, a single student may register for several courses. A single course can have
many students enrolled in it. This is the many-to-many relationship.
Student
Takes
Course
The next example shows a relationship in which it is possible that no instances exist. Each
professor may teach several course sections but may not teach at all. Assume there is no team
teaching; therefore each section must have a single professor.
Professor
Teache
Section
Finally, a more complex example which shows more than one relationship. All of the examples
above depict single relationships. An actual E-R diagram would show the many entities and
relationships that exist within a system. Here each department offers at least one course; there is
no cross-listing of courses with other departments. Each course must have at least one section but
often has several sections.
Professor
Teache
Section
Each relationship instance in WORKS FOR associates one employee with one department
entity. This is shown in the figure, where relationship instances e1, e2 work for Department d1,
relationship instances e2, e6 and e7 work for department d3. - _
Proper Naming of Schema Constructs
While designing a database schema, names for entity types, attributes, relationship types and
roles should be chosen such that they convey meanings attached to the different constructs in the
schema, as much as possible.
Entity type names are chosen to be singular. Entity type and relationship type names are in
uppercase letters.
In a narrative description of the database requirements, nouns give rise to entity type names
and verbs tend to indicate names of relationship types.
Attribute names arise from additional nouns describing the entity nouns. Binary relationship
names are listed left to right or bottom to top in an ER diagram.
The Entity-Relationship diagram for the relation works for can be shown as follows:
SSN
Salary
Salary
Addres
EMPLOYEE
Departme
Code
Numb
VENDOR
Address
The data types include standard numeric data types for integers and real numbers. There are also
characters, Booleans, fixed-length strings and variable length strings, date, time, timestamp and
so on.
Other possible domains may be described by a subrange of values from a data type or as an
enumerated data type in which all possible values are explicitly listed.
Key constraints
All tuples inn a relation must be distinct. No two tuples have the same combination of values for
all their attributes
A super key SK specifies a uniqueness constraint that no two distinct tuples in any state r of
relation R can have the same value for SK. Every relation has at least one default super key
which is the set of all its attributes. A super key can have redundant attributes.
A key K of a relation schema R is a super key of R with the additional property that removing
any attribute A from K leaves another set of attributes K that is not a super key of R any more.
A key satisfies two constraints.
i. Two distinct tuples in any state of the relation cannot have identical values for all the
attributes in the key.
ii. It is a minimal super key, i.e. a super key from which we cannot remove any attributes
and still have the uniqueness constraint hold.
Key constraint is a constraint that should hold on every valid relation state or the schema. The
value of a key attribute can be used to identify uniquely each tuple in the relation. A relation is a
set of tuples, where each tuple is distinct i.e. no two tuples can have the same values for all the
attributes. Usually there will be at least one set of attributes, for which two tuples will not have
same combinations of values and such a set of attributes is called the key of the relation schema
R.
In the student relation, REGNO can be considered as a key because no two tuples can have the
same value for REGNO. This key can be used to identify the tuples, since it is unique. This key
should maintain its uniqueness property, even when some more tuples are added to the relation,
i.e., the property is time invariant. Fields such as name should not be declared as keys, because
name may not be unique.
Sometimes a relation schema may have more than one key. Each of the keys is called a
candidate key. One of the candidate keys can be chosen to be the primary key which is used to
identify tuples in the relation. It is better to choose a primary key with a single attribute or a
small number of attributes.
Constraints on Null Values
Another constraint on attributes specifies whether null values are not permitted. The constraint is
specified as NOT NULL.
Integrity constraints are specified on a database schema and are expected to hold on every
valid database state of that schema.
The constraints that are part of relational model are
Entity integrity constraint
Referential integrity constraint
Entity Integrity
It states that no primary key value can be null. This is because this value only identifies the
individual tuples distinctly in a relation. Null values in two or more tuples means they cannot be
identified and distinguished.
Key constraints and entity integrity constraints are specified on individual relations.
Referential integrity
It is specified between two relations and is used to maintain the consistency among tuples in the
two relations.
It states that a tuple in one relation that refers to another relation must refer to an existing tuple in
that relation.
Example:
DNO of EMPPLOYEE gives the department number for which each employee works. This
should match with the DNUMBER value of some tuple in the DEPARTMENT relation.
EMPLOYEE-SSN, NAME, BDATE, ADDRESS, SALARY, SUPERSSN, DNO
DEPARTMENT-DNUMBER, DNAME, MGRSSN, MGRSTARTDATE
To define referential integrity constraint more clearly, we should know what a foreign key is.
Foreign key
A set of attributes FK in relation schema R1 is a foreign key of r1 that references relation R2 if it
satisfies the following two rules.
i. The attributes in FK have the same domains (S) as the primary key attributes PK of R2.
ii. A value of FK in a tuple t1 of the current state r1 (R1) either occurs as a value of PK for
some tuple t2 in the current state r2 (R2) or is null.
R1 is called the referencing relation. R2 is called the referenced relation. If these two conditions
hold, the referential integrity constraint from R1 to R2 is said to hold.
Referential integrity constraints typically arise from the relationships among the entities
represented by the relation schemas.
In the EMPLOYEE relation, the attribute DNO refers to the department for which an employee
works, hence we call DNO a foreign key of EMPLOYEE, referring to the DEPARTMENT
relation. This means that a value of DNO in any tuple t1 of the EMPLOYEE relation must match
a value of the primary key of DEPARTMENTThe DNUMBER attribute in some tuple t2 of
the DEPARTMENT relation, or the value of DNO can be null, if the employee does not belong
to any department.
Sometimes a foreign key can refer to its own relation, the attribute SUPERSSN of EMPLOYEE
relation, indicating the Social Number of the supervisor under whom the employee works, the
supervisor himself being in the same relation EMPLOYEE. EX. SUPERSSN is a foreign key
that references the EMPLOYEE relation itself.
Most relational DBMSs support key and entity integrity constraints and also support referential
integrity.
Semantic Integrity constraints
These may have to be specified and enforced on a relational database. Such constraints can be
specified and enforced within the application programs that update the database or by using a
general purpose constraint specification language.
Mechanisms such as Triggers and Assertions can be used. However, it is more common to check
for these types of constraints within the application programs than to use constraint specification
languages because they are difficult and complex to use correctly.
Example: The salary of an employee should not exceed the salary of the employees supervisor.
Maximum number of hours an employee can work on all projects is 56
Functional dependency constraint
This establishes a functional relationship among two sets of attributes X and Y. This constraint
specifies that the value of X determines the value of y in all states of a relation. It is denoted as
XY. This can be used as a tool to analyze the quality of relational designs and to normalize
relations to improve their quality.
2.
3.
4.
5.
6.
7.
8.
9.