[go: up one dir, main page]

0% found this document useful (0 votes)
448 views65 pages

Database Design and Management Part 1,2,3,4,5

Data Based Management System notes was prepared by Dr. Mbanzabugabo Jean Baptise, the Dean of the Faculty of Applied Science and Technology at the University of Tourism Technology and Business Studies- Kigali Rwanda
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
448 views65 pages

Database Design and Management Part 1,2,3,4,5

Data Based Management System notes was prepared by Dr. Mbanzabugabo Jean Baptise, the Dean of the Faculty of Applied Science and Technology at the University of Tourism Technology and Business Studies- Kigali Rwanda
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 65

UNIVERSITY OF KIGALI- KIGALI MAIN CAMPUS

FACULTY OF POST GRADUATE , M.SC.IT

SUBJECT: DATABASE DESIGN AND MNAGEMENT SYSTEMS


Lecturer: Dr. MBANZABUGO Jean Baptiste (BE, MCA, MSc.SE, PhD)

CHAPTER-1. BASICS OF DATABASE SYSTEMS


Introduction, Definition of Data, Data base and DBMS Need for database Functions of
database Management Systems , Metadata and catalog File based Database Environment and
Database systems Component of Data base Systems.
Database and Database Users, Characteristics of the Database Approach, Different peoples
behind DBMS, and Advantages of using DBMS, When not to use a DBMS, Classification of
Database Management Systems.
CHAPTER-2. DATABASE SYSTEM ARCHITECTURES
Database Schemas and Levels, DBMS Architectures: Architecture of a database system types:
Personal computer system Centralized system Client / Server system Distributed system,
and Data Independence and interfaces.
CHAPTER-3. DATA MODELING USING ENTITY-RELATIONSHIP MODEL
Data Models, High level, conceptual data models for Database Design with a example, Entity
types, Entity sets, Attributes and Keys, ER Model Concepts, Notation for ER Diagrams, proper
naming of schema constructs, Relationship types, degree of relationship and Data Dictionary.
CHAPTER-4. RELATIONAL DATABASE LAGUAGES AND STRUCTURE QUERY
LANGUAGES
Database Languages, Queries: Simple query where clause group by order by clauses -sub
queries correlated sub-queries.
Relation Algebra, Functions and Operators:
Join, Number Functions - sin(), cos(), tan(), sqrt(), Aggregate functions count(), avg(),
Conversion functions to_char(), to_date(), Arithmetic, Logical and set Operators
CHAPTER-5 NORMALIZAION FOR RELATIONAL DATABASES
Normal Forms based on primary keys, General Definitions: First Normal Form, Second Normal
Form and Third Normal Forms, Boyce-Codd Normal Form

CHAPTER 6. DATABASE MAINSTREAMING MANAGEMENT AND PROTECTION


Transaction control statements:
Commit, rollback, savepoint
Triggers:
Trigger definition Trigger Events - Built-in Trigger User defined Trigger database triggers
Creating and using Views and stored procedures:
creating a view, querying a view of more than one table.
Database Security and control and countermeasures

CHAPTER 1
BASICS OF DATABASE SYSTEMS
1.1. INTRODUCTION
In our everyday life in the modern society, we come across databases and database systems quite
frequently. Several activities on a day-to-day basis require the interaction with one or another
database. For example, bank transactions like deposit or withdrawal of money, railway or airline
reservation, borrowing or returning a book from a computerized library, purchasing items from a
departmental store or through Internet, computerized hospital management etc. all involve
accessing and interacting with databases. Databases play an important role in almost all areas
where computers are used, including business, electronic commerce, engineering, medicine, law,
education and library science, to name a few.
Traditional Database applications
In the traditional database applications, most of the information stored and accessed is either
textual or numeric. Some traditional database applications are:
Banking
Inventory
Airline reservation
Library management
Hospital management
Advanced Database applications
Some of the advanced applications of database systems are
Multimedia databases-that can store pictures, video clips and sound messages.
Geographic information systems (GIS)- that can store and analyze maps, weather data
and satellite images
Data warehouses and Online analytical processing (OLAP)-that are used in many
companies to extract and analyze useful information from very large databases for
decision-making.

Real-time and Active database technology- that is used in controlling industrial and
manufacturing processes
Mobile Databases- They are available on the users portable computers. Users interact
with the mobile database application, which in turn accesses the data stored in the mobile
databases through the DBMS.
Web databases- databases are integrated with web to support business operations like ecommerce, supply chain management or web publishing
1.2. BASICS OF DATABASE TECHNOLOGY
DATA
Data is a collection of known facts, figures or statistics that can be recorded and that is
meaningful. Data can be text, numbers, audio, video or any combination of these. Data in its raw
form may not be useful for decision-making. It must be processed so as to become useful.
DATABASE
It is a collection of related data, having valid and implicit meaning and is designed to meet the
information needs of an organization.
Example 1: A personnel Telephone directory, which contains name, address and telephone
numbers, arranged in some particular order, say sorted according to name, acts as a collection of
related data having valid and implicit meaning and hence it is a telephone database.
Example 2: A college staff register, containing the data regarding name, date of joining, basic
salary, net salary etc. For each staff member, where arranged in some order, becomes a Staff
database.
Example 3: There can be a student database containing the details of the students like register
number, name, date of birth, fathers name, mothers name, marks details etc.
IMPLICIT PROPERTIES OF DATABASE
1. A database represents some aspect of the real world. A database has some source from
which data is derived. There will be some degree of interaction with events in the real

world. There will be multiple applications and multiple users who are interested in the
contents of the database.
2. A database is a logically coherent collection of data with some inherent meaning.
3. A database is designed, built and populated with data for a specific purpose.
4. A database can be of any size and any level of complexity. If huge amount of data is
involved, the data must be organized and managed, so that users can search for, retrieve
and update the data whenever needed.
5. A database may be generated and maintained manually or by a group of application
programs written specifically for that task or by a database management system.
DATABASE MANAGEMENT SYSTEM
A database management system (DBMS) is a collection of programs that enables users to create
and maintain a database. This is a software system that allows access of the data contained in the
database. The primary goal of a DBMS is to provide an environment that is both convenient and
efficient to use in storing and retrieving database information.
FUNCTIONS OF DBMS
DBMS is a general purpose software system that performs the following functions.
i) Defining a database
ii) Constructing the database
iii) Manipulating the database
iv) Sharing databases among various users
v) Protecting the database
vi) Maintenance of a database
(i) Defining a database
This involves specifying the data types, structures and constraints for the data to be stored in
the database.
(ii) Constructing the database
This is the process of storing the data on some storage medium that is controlled by DBMS.

(iii) Manipulating a database


This includes functions such as updating the database to reflect changes, querying the
database to retrieve specific data, and generating reports from the data.
(iv) Sharing a database
This function allows multiple users and programs to access the database concurrently i.e.
multiple users can use the database simultaneously without creating inconsistent database.
(v) Protection of database
This includes system protection against hardware or software malfunction (or crashes), and
security protection against unauthorized or malicious access.
(vi) Maintenance of database
Maintenance of the database by the DBMS allows the system to evolve as and when the
system requirements change over a period of time.
DATABASE SYSTEM
The database and the DBMS software together is a Database system. It not only contains the
database but also a complete definition or description of the database structure and
constraints. This definition is stored in what is called the DBMS catalog.
DBMS CATALOG
The catalog contains information such as the structure of each file, the type and storage
format of each data item and various constraints on the data.
META DATA
The information such as structure of each file, the type and storage format of each data item
and constraints on the data stored on the DBMS catalog is called meta-data. It describe the
structure of the primary database.

DATABASE SYSTEM ENVIRONMENT


The diagram of Figure 1.1 shows the typical database system environment. It consists of the
following:
1. Application programs or queries written by users or programmers
2. Software to process these queries or application programs
3. Software to access the stored data
4. DBMS catalog which contains the stored database definition or metadata
5. The physical stored database
First the data type, structure, constraints for the data specified by the database designer should be
stored in the DBMS catalog as stored database definition (metadata). This is called defining the
database. The physical location in memory where the database is stored is called Stored database.

DATABASE SYSTEM
Application Programs/Queries

DBMS SOFTWARE

Software to process queries/programs

Software to access stored data

Stored Database

Stored

Definition
Database
Figure 1.1 Typical Database System Environments
The application programs or queries that are written by the user or programmer are processed by
the DBMS software so as to perform the required function. The DBMS software can also
perform functions like storing the data in the specified structure, manipulating with the database
performing operations like updation, retrieval etc. Whenever request to access data is made, a
part of the DBMS software first refers to the metadata to access the structure and hence
determine the size and position so as to access data from the stored database.
1.3. TRADITIONAL FILE ORIENTED APPROACH

Earlier the information was maintained in permanent system files. To allow users to manipulate
the information, the system has a number of application programs that can manipulate with files.
The application programs are written by system programmers in response to the needs of the
organization. New programs are added to the system as the need arises. As a result, new
permanent files are created. The file processing system is supported by a conventional operating
system. Permanent records are stored in various files and different application programs are
written to extract records from and to add records to the appropriate files.
Disadvantages of Traditional File Approach
This approach has a number of major disadvantages:
1. Data redundancy and inconsistency
2. Difficulty in accessing data due to data isolation
3. Integrity problems
4. Concurrent access anomalies
5. Security problems
1. Data redundancy and inconsistency
In a file-based system, same information may be duplicated in several files. Redundancy leads to
higher storage and access cost. In addition, it may lead to data inconsistency. Data inconsistency
means various copies of the same data may not be the same.
Example: An educational institute maintains the details of its students. The registration officer
maintains the names of the students, date of joining and course to which they have registered.
The accounting officer maintains students name, course to which he is admitted, date of joining
and fee details. Though most of information required by both the officers is the same, some
details are not the same and separate files have to be maintained. Also some common details are
stored in duplicates.
This existence of the same data in multiple places or the replication of the same data or duplicate
data existence is known as Data redundancy.
This redundancy in storing the data in multiple files has some disadvantages:

Wastage of manual effort to store in multiple files


Wastage of storage area
Problem of updating
If course name is changed, it may so happen that it is changed in one file and not in other files.
Even if all corresponding files are updated, it may still be inconsistent because individuals may
not update them properly. For example, if date of joining is to be updated, in one file it may be
updated as 10/10/1998 and in another file it may be updated as 10/1/1998. This causes Data
inconsistency.
2.

Difficulty in accessing data due to Data isolation

In a file system, data are scattered in various files and the files may be in different formats.
Therefore, it is difficult to write new application programs to retrieve appropriate data in a
convenient and efficient manner.
3.

Integrity problems

The data values stored in the database must satisfy certain types of consistency constraints. The
constraints, for example, may be that, the type of the data field must be specified for each data.
Consider, the grade field can have only one character entry from A to C in the university
database. Another type of constraint applies to the uniqueness of data. For example, Student
registration number of every student must be unique. Employee code for each employee in an
organization must be unique. Developers enforce these constraints in the system by adding
appropriate code in the various application programs. However, when new constraints are added,
it is difficult to change the programs to enforce them. The problem is compounded when
constraints involve several data items from different files.
4.

Concurrent-access anomalies

To improve the overall performance of the system and for a faster response time, many systems
allow multiple users to update the data simultaneously. In such an environment, interaction of
concurrent updates may result in inconsistent data. Because many different application programs
that have not been coordinated previously may access data, it is difficult to provide supervision.

5.

Security problems

Not every user of the database system should be able to access all the data. In order to maintain
security of data, access to the data must be controlled. For example, in a banking system, payroll
personnel should be able to view information about bank employees only. They need not access
information about customer accounts.
1.4. CHARACTERISTICS OF A DATABASE APPROACH
A number of characteristics distinguish the database approach from the traditional file approach.
In the database approach, there is a single repository where data are maintained. This database is
defined once and is then accessed by various users.
The main characteristics of the database approach are:
i) Self describing nature of a database system
ii) Program data independence and data abstraction
iii) Support of multiple views of data
iv) Sharing of data and multi-user transaction processing
(i) Self describing nature of the database
In a database approach, the database system contains not only the database but also the complete
definition of the database. The definition includes structure of the file, the type and storage
format of each data item and is stored in the system catalog. The information stored in the
catalog is referred to as metadata and it describes the structure of the primary database.
DBMS software refers to the catalog to know the structure of any database. The DBMS software
can access various databases by extracting the database definition from the catalog and corelating the database with the definition. The definitions of all the files are stored in the catalog.
When a request is made to access the name field of a student record, the DBMS software refers
to the catalog to first find the structure of the student file and then the size and the location of the
name field within the student record.
(ii) Program Data Independence and Data Abstraction

In a DBMS environment, if a new field is to be added to the student file, only the student file
definition in the catalog has to be changed and no change is needed in the programs that access
the database. This is known as Program Data Independence. The next time a DBMS program
refers to the catalog, the new structure of student record will be accessed and used
Many other details of file storage organization such as access paths on a file can be hidden from
database users by the DBMS. Database users and application programs refer to the conceptual
representation of files and DBMS extracts the details of file storage from the catalog whenever
required. This provides what is called Data abstraction to the database users.
The properly that allows program data independence and program operation independence is
called Data Abstraction.
Therefore, DBMS provides the user with only the conceptual representation of the data, which
does not include many of details of how the data is stored or how the operations are
implemented.
(iii) Support Of Multiple Views Of Data
A database will usually have multiple users, where different users may need different views,
different parts of the database. A view, i.e. the required part may be directly available in the
database (stored) or it may have to be derived from the existing data.
(iv) Sharing Of Data and Multi-User Transaction Processing
In a database approach, a single repository of data is maintained by integrating data from
multiple applications. Different users can share the data from the same database with a multi user
DBMS. It is required that multiple users access the same data at a time. For this, DBMS must
have a concurrency control to ensure that several users trying to update the same data do so in a
controlled manner so that the result of the updation process is correct. For example, in any flight
reservation system, the reservation clerks should not try to allot the same seat to different
persons. DBMS should take care to see that only one clerk would have the access to the seat at a
time. These are called On Line Transaction Processing (OLTP) applications. Example refer to
RwandaAir booking System.

1.5. DIFFERENT PEOPLE BEHIND DBMS


For a small personal database, a person typically defines, constructs and manipulates the
database. However, many persons are involved in the design, use and maintenance of a large
database with a few hundred users. Their jobs involve the day-to-day use of a large database.
There are others who work to maintain the database system environment, but who are not
actively interested in the database itself.
There are two categories of people behind DBMS. They are:
i) Actors on the scene
ii) Workers behind the scene
(i) Actors on the scene
The persons involved in the design, use and maintenance of large database are:
Database administrator (DBA)
Database designer
End User
Software Engineer (System Analyst and Application Programmer)
(ii) Workers behind the scene
The persons who work to maintain the database environment but are not actively involved in
the database itself are:
DBMS system designers and implementers
Tool developers
Operators and maintenance personnel
1.5.1. Database Administrator (DBA)
In a database environment, primary resource is the database itself and the secondary resource is
the DBMS and its related software. In an organization, many users share these resources. Then
there arises a need to manage these resources. The DBA is responsible for authorizing access to
the database, coordinating and monitoring its use and for acquiring software and hardware
resources as needed. The DBA is accountable for problems such as breach of security or poor
system response time.

The person controlling and coordinating the DBMS system is called the Database Administrator
(DBA).
The responsibilities of the DBA can be listed as follows:
1.

The DBA administers and sets up the definition of the global view or conceptual level of
the database. The DBA further specifies the external view of the various users and
applications. He is responsible for the definition and implementation of the internal level,
including the storage structure and access methods to be used for the optimum performance
of the DBMS.

2.

The DBA should ensure that appropriate measures are in place to maintain the integrity
and/or reliability of the database and that the database is not accessible to unauthorized
users. The DBA is responsible for granting permission to the users of the database and
stores the profile of each user in the database. This profile describes the permissible
activities of a user on that portion of the database accessible to the user through one or more
user views. The user profile can be used by the database system to verify so that a particular
user performs a given operation on the database.

3.

The DBA is also responsible for defining procedures to recover the database from failures
due to human, natural, or hardware causes with minimal loss of data. This recovery
procedure should enable the organization to continue to function and the intact portion of
the database should continue to be available.

1.5.2. Database Designers


Database designers are responsible for identifying the data to be stored in the database and for
choosing appropriate structure to represent and store this data. It is the responsibility of the
database designers to interact with all the end users of the database and understand their
requirements and then design the database to fulfill all the user requirements. The database
designers interact with various user groups and develop various views of the database those cater
to the requirements of the individual groups. Finally all views are integrated to produce the final
database, which meets the requirements of all the user groups.
1.5.3. End Users

End users are the people who access the database for querying (retrieving data for a specific
requirement), updating and generating reports. There are different categories of end users. They
are:
i) Casual end users
ii) Nave or parametric end users
iii) Sophisticated end users
iv) Stand alone users
i)

Casual end users

They are the end users who occasionally access the database. Each time, they may require
different information from the database. They use sophisticated database query language to
specify their requests. They need to know only few facilities, which they are concerned with and
may be used repeatedly.
ii) Nave or parametric end users
Most number of database end users is the nave or parametric end users. They constantly make
use of the database, querying and updating the database. They utilize standard types of
queries/commands and updates called canned transactions, which are carefully programmed
and tested. Nave users need not know much about the facilities provided by the DBMS. They
only have to understand the user interface of the standard transactions designed and implemented
for their use.
For example, Bank tellers do the functions of checking account balances and posting the
transactions like withdrawals and deposits.
Reservation clerks for Airlines, Railways and hotels check for the availability of the facilities
may be seats or rooms and then accordingly make reservations.
iii) Sophisticated end users
They include engineers, scientists, business analyst, who thoroughly familiarize with the
facilities of DBMS to meet their requirements. They learn most of the DBMS facilities in order
to achieve their complex requirements.

iv) Stand-alone users


They maintain personal databases by using readymade packages. These packages provide easy to
use menu based or graphics based interfaces. For example, users of tax packages, financial
packages, etc. Store a variety of personal financial data for tax purposes. Such users become well
versed in using the particular software package.
1.5.4. System Analysts And Application Programmers
(Software Engineers)
System Analysts and Application Programmers are generally referred to as Software Engineers.
System analysts determine the requirements of end users, especially nave and parametric users
and develop specifications for canned transactions that meet these requirements.
Application programmers implement these specifications as programs, test them and debug,
document and maintain them.
These analysts and programmers must be aware of all the facilities provided by the DBMS, so as
to accomplish their tasks.
1.5.5. DBMS Designers And Implementers
These are the persons who design and implement the DBMS software package and implement it.
The DBMS software consists of many complex modules like modules for implementing the
catalog, processing query language, processing the interface, accessing and buffering data,
controlling concurrency and handling data recovery and ssecurity.
1.5.6. Tool Developers
Tools are software packages that facilitate database system design and usage and help in
improving performance. Tools include packages for database design, performance monitoring,
graphical interfaces, simulation, and test data generation. Tools are optional packages and are
often available separately.
Tools developers are the persons who design and implement such tools.

1.5.7. Operators and Maintenance Personnel


These are the system administration personnel who are responsible for the actual running and
maintenance of the hardware and software environment for the database system.
1.6. ADVANTAGES OF USING DBMS
A good DBMS should posses certain capabilities. With these capabilities, the DBA must achieve
a variety of objectives related to the design, administration, and use of a large multi-user
database.
1.6.1. Controlling Redundancy
In the traditional file approach, each user maintains his/her own file. For example, an educational
institute maintains the details of its students. Registration officer maintains the names of the
students, date of joining and course to which they have registered. The accounting officer
maintains students name, course to which he is admitted, date of joining and fees details. Though
most of the information required by both the officers is the same, but exactly it is not same.
Because of this, they have to maintain separate files. Name of the student, date of joining and
class are repeated in both the files, i.e. they are stored in duplicates. This existence of the same
data in multiple places or the replication of the same data or duplicate data existence is known as
Data redundancy.
This redundancy in storing the data in multiple files has some disadvantages.
There is wastage of manual effort to store it in multiple files.
There is wastage of storage area.
There is the problem of updation, i.e., if course name is to be changed, it may happen that
it is changed in one file and not in other files.
Even if updation is applied to all the files, it may still be inconsistent, because individuals
may not update it properly. Ex., if we want to update date of joining, one may update it as
10-10-1998 to 01-10-1998 and other one may change it to 10-01-1998.

In the database approach, to avoid this redundancy, we have a database which stores each data
item such as students name or birth date in only one place. This avoids inconsistency by saving
effort, storage area and by providing efficient updation.
1.6.2. Restricting Unauthorized Access
In order to maintain the security of the data, access to the database has to be controlled in an
environment where multiple users share a large database. Usually, financial data is supposed to
be confidential and for such databases users must be authorized to access it. Retrieving data and
updating the data are two different types of access. Some users may be only authorized to
retrieve the data and not to update it. Some may be authorized to do both. Typically, users or user
groups will be given some account numbers protected by passwords, through which the database
can be accessed.
The same strategy can be applied to DBMS software, where nave users are permitted only to
retrieve the database by canned transactions.
1.6.3. Providing Storage Structures For Efficient Query Processing
Database systems must provide capabilities for efficiently execution queries and updates.
Auxiliary files called indexes are used by DBMS to speed up disk search for the desired records.
Also buffering module of DBMS maintains parts of the database in main memory buffers. The
query processing and optimization module of the DBMS is responsible for choosing an efficient
query execution plan for each query based on the existing storage structures.
1.6.4. Providing Backups and Recovery
In a DBMS, always there should be a facility to recover from hardware and software failures.
This facility is called the Backup and recovery subsystem of the DDBMS. For example, if the
system fails when an updating program is under execution, the recovery system should see that
the database is restored back to its initial state i.e., the stage before starting the execution.
Alternatively, the recovery subsystem could ensure that, the program is resumed from the point
at which it was interrupted. Also, the system should provide automatic backup facility in order to
avoid power failure problems.

1.6.5. Providing Multiple User Interfaces


Since different types of users with different levels of technical knowledge use a database, DBMS
must provide variety of interfaces. These include query language for the casual users,
programming language interface for application programmers, forms and commands codes for
parametric users and menu driven interfaces and natural language interfaces for stand-alone
users. Graphical User Interfaces (GUI) consists of both forms-style interfaces and menu-driven
interfaces. These GUIs can be specified with the help of specialized languages and environments.
1.6.6. Representing Complex Relationships Among Data
A database may include numerous varieties of data that are interrelated in many ways. A DBMS
must have the capability to represent a variety of complex relationships among the data as well
as to retrieve and update related data easily and efficiently.
1.6.7. Enforcing Integrity Constraints
A database always has certain integrity constraints to be followed.
The constraints that can be enforced are
i) The type of the data field must be specified for each data.
ii) The width of data should be specified i.e. the number of characters. For example, Name
field in University database should contain no more than 25 alphabetic characters or
Grade field can have only one character entry and only A to C.
iii) A record in one file must be related to record s in other files. For example, every section
name must be related to a course name.
iv) Another type of constraint applies the uniqueness of data. Ex. Register_no must be
unique for every student in a student database.
The database designers must incorporate these constraints in the design stage of the database.
Some constraints can be directly enforced by the DBMS software itself and some have to be
enforced through update programs or while entering the data.

Sometimes, incorrect data may also satisfy the constraints. Ex. The grade B may be wrongly
entered, as C, but still it is acceptable to the constraints and this type are unavoidable. Manually
tracing the error and then correcting the database can only avoid these problems.
1.7. WHEN DBMS SHOULD NOT BE USED
DBMS works out costlier under following circumstances:
High initial investment in hardware, software and training.
Generality that a DBMS provides for defining and processing data.
Overhead for providing security, concurrency control, recovery and integrity functions.
The database and its applications are simple, well defined and not expected to change.
There are stringent real time requirements for some programs that may not be met
because of DBMS overhead.
Multiple user access to data is not required.

REVIEW AND SELF ASSESSMENT QUESTIONS


1. Define the terms:
i) Data
ii) information
iii) Database
iv) DBMS
v) Database system
vi) Meta data
vii) DBMS Catalog
2. List any three implicit properties of Database.
3. Give any three examples of Databases
4. Explain the functions of DBMS
5. Draw a neat diagram showing the simplified database system environment and explain
briefly.

6. Explain the disadvantages of traditional file approach.


7. Explain the characteristics of database approach.
8. Explain the following:
i) Program Data independence
ii) Data Abstraction.
9. Expand DBA. What is the role of a DBA?
10. Discuss about various people involved with DBMS.
11. Write a note on various types of End users using DBMS
12. Explain the advantages of DBMS
13. What are the circumstances when a DBMS should not be used?
14. Explain the responsibilities of DBA.
15. Explain overall structure of database system Environment with detailed diagram.

CHAPTER 2
DATABASE SYSTEM MODELS AND ARCHITECTURES
2.1. INTRODUCTION
Earlier DBMS software package was one integrated system where large centralized mainframe
computers were used. The modern DBMS packages are modular in design, with a client/server
architecture where hundreds of distributed workstations and personal computers are connected
via communications network. There are various types of servers like Web servers, database
servers, file servers, application servers and so on. A client module will run on a user workstation
or personal computer. Typically, application programs and user interfaces that access the
database run in the client module. The client module handles user interaction and provides the
user-friendly interfaces such as forms or menu based GUIs. The server module handles data
storage, access, search and other functions. The study of data models, schemas and instances is
very important to understand database systems.
2.2. DATA MODELS, SCHEMAS, AND INSTANCES
2.2.1 Data Model
A data model is a collection of concepts that can be used to describe the structure of the database.
It is a type of data abstraction that is used to provide conceptual representation of data. It uses
logical concepts that can be easily understood.
A data model is a set of concepts that can be used to describe the structure of a database, which
includes data types, relationships and data constraints etc. It also includes a set of basic
operations for specifying retrievals and updates on the database. Also dynamic aspect or behavior
of a database application is included in a data model.
Categories of Data Models
Data models are categorized based on the types of concepts that they provide to describe the
database structure.
i) High-level or conceptual data model

ii) Low-level or physical data model


iii) Representational or implementation data model
i) High level or conceptual data model
This model provides the concepts that are close to user views. The end users can understand
them. It uses concepts such as entity, attributes and relationships.
An entity represents a real world object or concept such as an employee or a project that is
stored in a database.
An attribute represents some property of interest that further describes an entity, such as
employees name or salary.
A relationship among two or more entities represents an association among two or more entities,
for example, relationship between an employee and a project.
ii) Physical or low-level model
This model provides the concepts that are close to the computer view of storage. They are meant
for computer specialists. It describes how data is stored as file in the computer by representing
information such as record formats, record orderings and access paths. An access path is a
structure that makes the search for particular database records efficient.
iii) Representational model
This model is in between these two extremes of the physical model and the conceptual model.
This does not hide all the storage details from the user and it can be implemented on a computer
system directly. It represents data by using record structures and hence is sometimes called
Record-based data model.
The current DBMS use representational (implementation) data models widely. These include the
three most popular data models:
Relational model
Network model
Hierarchical model

These data models represent data by using record structure and hence they are also called as
record-based data models.
2.2.2 Schemas And Instances, Database State
The database description is called as Database Schema (metadata). The database schema is
defined during the database design only and it is not expected to change frequently. Many of the
data models provide the diagrammatic representation of the schema, known as the Schema
diagram. Here, only the database structure is displayed without any actual data. Each object in
the schema is called a Schema construct, such as student, course, section, etc.
STUDENT
NAME

STUDNO

MOBILE

MARKS

COURSENO

CLASS

MARKS

COURSENO

SEMESTER

YEAR

SECTIONID

GRADE

COURSE
COURSENAME
SECTION
SECTIONID
GRADE_REPORT
SECTIONID
PRE_REQUISITE
COURSENO

PRE_REQUISITENO

Figure 2.1 Schema Diagram for University Database


But a schema diagram does not represent all the aspects of a database structure. In the above
schema diagram, neither the data type of each item is shown nor the relationship among the
various files object. Also it does not include the constraints.
In the database, actually the data may change from time to time. For example, if we want to add
a new student or change the section of a student, we have to make changes in the database.

2.2.3 Database states


The data contained by the database at any particular moment in time is called the state of the
database or snapshot. It also known as current set of occurrences or instances in the database.
Each construct of the database will have its own state at any instance of time. If changes are
made in the database, it changes the state of the database into another state.
When a new database is defined, its database schema is specified only to the DBMS. At this
point of time, the database is in Empty State with no data.
When the database is first loaded with initial data, it is said to be in the Initial State.
Every time an update operation is performed on the database, it enters another database state.
The state of the database at any instance of time is called the Current State.
The DBMS stores the description of schema and constrains (metadata) in the DBMS catalog and
the data in the database. User programs establish a communication between these two to get the
required output. Thus it is the responsibility of DBMS to see that every state of database is a
valid state by carefully designing the schema. A database is said to be in a valid state if the state
satisfies the structure and constraints specified in the schema.
As the requirements of an application change, the database schema is subject to changes
occasionally. This is called Schema evolution.

2.3. DBMS ARCHITECTURE


DBMS architecture consists of three levels known as Three Schema Architecture. It is a
convenient tool with which the use can visualize the schema levels in a database system. It
contains the following three schemas:

1. The Internal Level: This consists of an internal schema, which describes the physical
storage structure of the database. It is the lowest level of abstraction. It does not hide the
storage details. It contains the definition of the stored record, the method of representing
the data fields and the access aids used. This internal schema uses a physical data model
and describes the complete details of data storage and access paths for the database. It is
also called as the Physical Schema.
2. The Conceptual Level: This has a conceptual schema, which describes the structures of a
database for a group of users. This schema hides the storage details from the user and it
includes description of entities, data types, relationships, user operations and constraints.
The description of data at this level is in a format independent of its physical
representation.
3. The External Level: This has a number of external schemas or user views. Each external
schema describes the part of the database that a particular user group is interested in and
hides all the other details from this group (all other databases). This is at a highest level
of database absorption.

End Users
External

External view 1

Level

External/Conceptual
Mapping

..

External view N

Conceptual Level

Conceptual schema

Conceptual/Internal
Mapping

Internal Level

Internal schema

Stored
Database

Figure 2.2 Three-schema DBMS architecture


The three schemas are only description of the data but the actual data exists only at physical
level. In a three-level architecture based DBMS; the user refers to its own external schema only.
Then the DBMS must transform the request specified by the external schema into a request
against the conceptual schema and then into a request on the internal schema for processing over
the stored database. If the request is to retrieve a database, it has to be reformatted to meet the
user requirements. This process of transforming requests and results between the three levels of
schemas is known as Mapping.
MAPPING
The transformation of requests and results between the three levels are called Mappings.
As shown in the figure, two mappings are required for a three-level DBMS. The two types of
mapping are

External/Conceptual Mapping
Conceptual/Internal Mapping
External/Conceptual Mapping
The mapping between external and conceptual level, gives the correspondence between records
and relationships of external and conceptual levels.
Conceptual/Internal Mapping
The mapping between conceptual and internal levels gives the correspondence between the
structure (metadata) and the size and position of the data in the stored database.
So the three level schema architecture is a convenient tool with which the user can visualize the
schema levels in a database system.
2.4 DATA BASE SYSTEM ARCHITECTURES
2.4.1 CENTRALIZED ARCHITECTURE
With the use of personal computers and workstations, the database systems were centralized in
which all the DBMS functionality, application program execution and user interface processing
were carried out on one machine. Figure 2.4 shows the physical components in a centralized
architecture.
In a centralized database system, always the data and the DBMS software reside only at one
place or site. Other sites, which are connected to this centralized site can access the database. In a
centralized system if the central system fails, then the entire system will be halted.

Display

Display

Monitor

Monitor

Display
.

Monitor

Network

Application
Programs

Terminal Display

Text
Editor

control

..
DBMS

Compilers

Controller

Controller

Controller

CPU
Memory

Disk

I/O devices
(printers, tape
drives..)

Figure 2.4 Physical centralized DBMS architecture

2.4.2 DISTRIBUTED ARCHITECTURE


As the complexity of the database increases, it becomes difficult to handle the database entirely
at one place. The solution for this is to distribute the database into different sites, which are
interconnected through some communication network. This type of database framework is called
Distributed database (DDBS) and the software required for this as Distributed Database
Management System (DDBMS).
Advantages of Distributed DBMS
1. Naturally distributed data
2. Reliability and availability
3. Controlled data sharing

4. Improved performance
Naturally distributed data
In some cases data is naturally distributed over different areas. For example, a company may
have its head quarters in one place and its branches may be situated in some other places. In such
cases, the different sites maintain only their local data, coming under their limitations. This is
just like a small mini world, having all the features of the database, but only local data. Only
the global users, such as the head quarters will refer to these local sites, whenever need arises.
Reliability and Availability
In a distributed system, because the data is distributed over different sites, even if one site fails,
the system will continue working with other sites. Only the data under the failed site will not be
accessible. Thus Distributed DBMS are more reliable, compared to centralized systems. Also,
since each site handles only limited amount of local data, the system is relatively less busy. This
increases the probability of the system being available continuously at any given time interval.
Duplicating the data and the software at different sites also improves the performance.

Controlled data sharing


Since each site controls its data and the software, it is called localized control. Occasionally,
some other sites may access or share the data of another site, through the software. Hence,
controlled sharing of data is achieved.
Improved performance
In Distributed DBMS a large database is divided into smaller databases at different sites. Hence,
the size of the databases at different sites is limited, and because of this limited size, the
complexity is reduced at every stage, thereby improving the performance. As a result, minimum
transactions will be there and data will be accessed faster. If data is shared from different sites,
processing may proceed in parallel at different sites.

The Distributed DBMS software must have some additional functions such as:
Ability to access remote sites through the network
Keeping track of data distribution and replication in the DBMS catalog
Defining strategies for queries, which share the data from different sites.
Ability to maintain the consistency of replicated data at different sites.
In a DDBMS, the data, which logically belongs to a single database, is physically distributed
over different sites. The sites may be located close to each other, say in adjacent buildings,
connected by Local Area Network (LAN) or they may be located geographically apart,
connected by long haul network (connected through telephone line or satellites). Physically, it is
achieved by a model called Client-server architecture.
2.4.3 CLIENT-SERVER ARCHITECTURE
The client/server architecture was developed to deal with computing environments in which a
large number of PCs, workstations, file servers, printers, database servers, web servers and other
requirement are connected via a network.
In client server architecture, number of personal computers, workstations, servers and printers
are involved. Each server is assigned a specific functionality. For example, software server stores
the software required by all its clients, whereas the client machine can only store the programs
and for executing these programs, they should get connected to the server. Similarly we have file
server that maintains the files of the client machines. We have the printer server which is
connected to various printers and all print requests by the clients are forwarded to this machine.
The resources provided by specialized servers can be accessed by many client machines. The
client machines provide the user with the appropriate interfaces to utilize these servers, as well as
with local processing power to run local applications.
There are two client/server frameworks.
Two-tier
Three-tier
2.4.3.1 TWO-TIER CLIENT/SERVER BASED ARCHITECTURE

In two tier architecture, the software components are distributed over two systems: the client and
the server.
Figure 2.5 shows client/server architecture at the logical level.

Client

Client

Client

Print

File

DBMS

Server

Server

Server

..

Figure 2.5 Logical two-tier client/server architecture


Figure 2.6 shows the physical client/server architecture. Some machines like diskless
workstations, or workstations/PCs with disk that have only client software installed would be
only client sites. Other machines would be dedicated servers. Some other machines would have
both client and server functionality.

SERVER
CLIENT

Site 1

CLIENT

Site 2

SERVER

Site 3

omm
Communication Network

CLIENT

Site n

Figure 2.6 Physical two-tier client-server architecture


A client is a user machine that provides user interface capabilities and local processing. When a
client requires access to additional functionality that does not exit at that machine, it connects to
a server that provides the needed functionality.
A server is a machine that can provide services to the client machines such as file access,
printing, archiving, or database access.
Some machines install only server software, some install only client software, others include
both client and server software.
Advantages of two tier architecture
1. Simplicity
2. Compatibility with existing systems

2.4.3.2 THREE-TIER CLIENT/SERVER ARCHITECURE


With the emergence of World Wide Web, many web applications use the three-tier architecture.
There is an intermediate layer between the client and the database server called the application
server or web server and is named according to the application.
Client

GUI, Web Interface

Application Programs
Application Server

Web pages

or web Server
Database Management
Database Server

System

Figure 2.7: Logical three-tier client/server architecture

The web server plays the intermediary role by storing business rules (procedures or constraints)
that are used to access data from the database server. It checks the clients credentials before
forwarding a request to the database server. The intermediate server accepts requests from the
client, processes the request and sends database commands to the database server, and then acts
as a conduit for passing processed data from the database server to the clients, where it may be
processed further and filtered to be presented to users in GUI format. The user interface,
application rules and data access acts as the three tiers.
2.5. Type and Use of the Database Interface
Following, two important uses of a database interface like SQL are listed:
Interactive

SQL can be used interactively from a terminal.

Embedded

SQL can be embedded into another language (host language) which


might be used to create a database application.

2.6 CLASSIFICATION OF DBMS


DBMS may be classified in various ways.
(i) Based on data models
(ii) Based on number of users that are supported by the system
(iii)Based on number of sites that the database is distributed
(iv) Based on the purpose

(i) Based on the data models


DBMS may be classified as
1. Relational
2. Network
3. Hierarchical
4. Object-oriented

5. Object-relational
Relational Model
The principles of the relational model were first outlined by Dr. E.F. Codd. The Relational Data
model represents a database as a collection of tables, where each table can be stored as a separate
file. Most relational databases have high-level query languages and support a limited form of
user views. RDBMS relational database management system is based on the relational model
developed by E.F. Codd. A relational database allows the definition of data structures, storage
and retrieval operations and integrity constraints. In such a database, the data and relations
between them are organized in tables. A table is a collection of records and each record in a table
contains the same fields.
Term

Definition

Example

Entity

A person, place, thing or even

Employee, company, book

Attribute

A property of an entity

For a book: code, title, type, price

Relationship

Association between entities

A book with an author

For example, consider the table EMPLOYEE.


Emp_code

Name

Designation

Qualification

Department

Salary

E001

Vani

Manager

PG

R&D

25000

E002

Preetha

Ass. Manager

PG

Accounts

20000

E003

Aditi

Supervisor

PG

Production

20000

E004

Vidya

Assistant

Diploma

Production

5000

The entity is Employee. The attributes are Emp_code, Name, Designation, Qualification,
Department and Salary. Vidya is a data item for the attribute Name. PG is a data item for the
field Qualification.
Properties of Relational Tables
Values are atomic
Each row is unique
Column values are of the same kind

The sequence of columns is insignificant


The sequence of rows is insignificant
Each column has a unique name
The relational database model is based on the Relational Algebra, Relational calculus.
Many systems use the relational model. DB2 by IBM and ORACLE 7 are Relational DBMSs, as
is ACCESS in the personal computer environment.
Network Model
The Network Data Model represents data in terms of records and also 1:N relationship, called as
a Set type. Figure 2.4 shows a network schema diagram, for the database, where rectangular
boxes represent record types and arrows show set types. This model is associated with record-ata-time language that must be embedded in a host programming language.
The network model permitted the modeling of many-to-many relationships in data.

The data

model is a simple network, and link and intersection record types (called junction records by
IDMS) may exist, as well as sets between them. Usually, a set defines a 1:M relationship,
although 1:1 is permitted.

STUDENT

COURSE

SECTION

GRADE-REPORT

PRE-REQUISITE

Figure 2.4 Network schema diagram


Publisher

Bank-Author

Book

Author

Network database structure

Book-

Branch

Branch
Figure 2.5 Network database model
A network database consists of two data sets
A set records
A set links
The record types are made up of fields in the usual way. Links may be followed in their
direction.
Characteristics of Network Model
1. Data is organized into logical units called record types
2. Record types can have more than one owner (each data grouping can belong to more than
one category)
3. There are many efficient entry points into the database
Hierarchical Model
In a hierarchical data model, data is organized into a tree-like structure. Each hierarchy
represents a number of related records. There is a hierarchy of parent and child data segments.
This structure implies that a record can have repeating information, generally in the child data
segments. Data in a series of records, which have a set of field values attached to it.

It collects all the instances of a specific record together as a record type. These record types are
the equivalent of tables in the relational model, and with the individual records being the
equivalent of rows. To create links between these record types, the hierarchical model uses
Parent Child Relationships. These are a 1:N mapping between record types.
For example, an organization might store information about an employee, such as name,
employee number, department, salary. The organization might also store information about an
employees children, such as name and data of birth. The employee and children data forms a
hierarchy, where the employee data represents the parent segment and the children data
represents the child segment. If an employee has three children, then there would be three child
segments associated with one employee segment. In a hierarchical database the parent-child
relationship is one to many. This restricts a child segment to having only one parent segment.
The first database systems developed were hierarchical. Hierarchical DBMSs were popular from
the late 1960s, with the introduction of IBMs Information Management System (IMS) DBMS,
through the 1970s.
The language used for hierarchical model is usually a record-at-a-time language.
Characteristics of Hierarchical Data model
1. Data is grouped into logical units called segment types.
2. Segment types are linked in a parent/child hierarchical where each child segment type has
only one parent segment type.
3. There is only one efficient entry point.
4. Database use physical pointers to link records.
Advantages
1. The structure of a hierarchical database is easy to understand. The hierarchy is based on
logical relationships between parent and child segment types.
2. They provide very efficient high speed retrieval.
Disadvantages
1. There are difficult to modify as an organizations data needs change.

2. It is difficult to use them to represent non-hierarchical relationships

The Figure 2.6 represents a hierarchical database at a typical school where each department
employs many faculties and each faculty advises many students.

Department

Faculty
Faculty

Student

Student

Figure 2.6 The hierarchical model


Comparison between Hierarchical and Network model
Characteristic

Hierarchical

Network

Data Grouping

segment types

record types

Relationship Between

restricted to one parent

allows more than one owner

Grouping Types

(owner)

many efficient entry points

Data Access

one efficient entry point

Can have any number of

Number of Parents

Exactly one parent

parents (also zero)

(ii) Based on the number of sites on which database is distributed


DBMS can be classified as:
1. Centralized DBMS
2. Distributed DBMS
3. Homogeneous DBMS
4. On-line transaction processing DBMS
Centralized DBMS contains the database at only one computer site, which can support multiple
users.
Distributed DBMS contains the database and the DBMS software distributed over number of
computer sites, connected by a network.
Homogeneous DBMS is the same DBMS existing in multiple sites.
The recent trend is to develop software to access several autonomous pre-existing databases
stored under heterogeneous DBMS.
One-line transaction processing (OLTP) Reservation clerks at railways and airlines use special
purpose DBMS software that supports a large number of concurrent transactions without
imposing excessive delays. This is an example of On-line transaction processing system.
Centralized
1) Database is maintained at
one site.
2) If centralized system fails, entire system is
halted.
3) Less reliable

Distributed
1) Database is maintained at a number of
different sites.
2) If one system fails system continues work
with other sites.
3) More reliable

(iii) Based on purpose


DBMS could be classified as:
1. Special purpose
2. General purpose
Special purpose DBMS is a package developed to suit the needs of a particular user, which
cannot be used by other users. Reservation clerks at Railways and Airlines use this type of
packages. These fall into the category of on line transaction processing systems, which must
support a large number of concurrent transactions without imposing excessive delays.
REVIEW AND SELF ASSESSMENT QUESTIONS
1.

What is a Data Model? Name the three categories of data models.

2.

Explain the three categories of data models?

3.

What are types of representational models? Name any three popular models.

4.

Explain various database states.

5.

What is schema evolution?

6.

Explain the three level DBMS architecture with a neat diagram

7.

Explain the Data base System architecture with a neat diagram

8.

What is metadata?

9.

Write a note on DBMS interfaces.

10. Explain classification of DBMS.


11. Differentiate between hierarchical and Relational Models.
12. Explain Hierarchical Model.
13. Explain Network Model.
14. Explain Relational Model.
15. Discuss any three differences between distributed DBMS and centralized

CHAPTER 3

DATA MODELING USING ENTITY-RELATIONSHIP MODEL


High level, conceptual data models for Database Design with a example, Entity types, Entity
sets, Attributes and Keys, ER Model Concepts, Notation for ER Diagrams, proper naming of
schema constructs, Relationship types, degree of relationship and Data Dictionary.
3.1. INTRODUCTION
Database application means a particular database and the associated programs that implement the
database queries and updates. In designing a successful database application, it is important to
have conceptual modeling. The Entity-Relationship (ER) model is popular high-level conceptual
model. It describes the structuring of data and constraints and their use in the design of
conceptual schema for database applications. The diagrammatic notation associated with the
ER Model is called ER diagram.
3.2. ROLE OF HIGH LEVEL CONCEPTUAL DATA MODELS FOR DATABASE
DESIGN
The database design begins with the software requirement specifications of the given problem.
The precise requirement collection is very important to have a good database design. This is then
to be analyzed.
The next step is to create a conceptual schema that describes the data type, relationships, and
constraints. This is called the Conceptual design. It deals with the high-level data descriptions of
the problem and hence implementation details are hidden.
The implementation model consists of two phases:
Logical database design
Physical database design
The database design process consists of the following steps.
1. Requirement collection and analysis
2. Conceptual design
3. Logical design (Implementation of data model)

4. Physical design

Requirement collection and analysis


This phase is also called Feasibility phase. During this phase, the database designers interview
database users. This helps them to understand and document their data requirements.
Conceptual design
Conceptual schema for the database is created using a high-level conceptual data model. The
conceptual schema describes data requirements of the users, entity types, relationships and
constraints. These concepts do not include implementation details.
Logical design
This step involves implementation of data model. Since the conceptual schema is transformed
from the high level model into the implementation data model, this step is also known as Data
model mapping.
Logical Database Design is the process of designing a model of the information in an
enterprise based on the chosen database model. This result in a database schema in an
implementation data model such as relational or object-relational database model.
Physical design
During this step, the internal storage structures, indexes, access paths and file organizations for
the database files are specified.
Physical database design is the process of describing the implementation of the database on
the disk. It describes the internal storage structures, indexes, access paths, base relations,
security issues and constraints.
Overall Database Design involves the following steps:
a. Identifying all the required files.
In database terminology, files are called record types and/or Entities.

b. Identifying the fields of each of these record types.


Fields in database terminology are called attributes. Note that attributes in semantic
meaning of databases are called properties of Entity/Table.
c. Identifying the primary key of each of these files.
Note that the primary key is a field that uniquely identifies a specific record in the file.
d. Identifying the relationship between record types.
The designer must identify the manner in which information from different files are related and
accessed. This is achieved by identifying the foreign and primary keys. A foreign key is a field
within a file, which is the primary key of another file. The secondary key is a field that identifies
a unique record or group of records with a specific characteristic.
Example Database Application
Let us consider an example database application that demonstrates the basic ER model concepts
and their use in schema design.
i)

First the data requirements for the database are to be listed.

ii) Then its conceptual schema should be created step by step.


Consider COMPANY database. This database keeps track of employees, departments and
projects.
The company can be organized into departments. Each department can have a unique name,
unique number and a particular employee who is the manager of the department.
A department controls a number of projects, each of which has a unique name, a unique number.
Each employee will have employee name, empid, address, salary, gender, date_of_birth.
An employee will belong to one department but many works on many projects that may not
necessarily be controlled by the same department. Number of work hours per week of the
employee is to be noted.

Also each employee may have one or more dependents. These details are to be maintained for
insurance purposes. The dependents name, sex, birth date and relationship to the employee are
maintained.
The Company Database will have the following relations with their respective attributes.
Department: DeptNo, DeptName, Dlocation, Manager
Project: ProjNo, ProjNName, Plocation
Emmployee: EmpId, EmpName, Address, Sex, Salary, DOB
Dependent: DepName, gender, DOB, Relnship
INITIAL CONCEPTUAL DESIGN OF THE COMPANY DATABASE:
Now we can identify four entity types for the above company database, namely department,
project, employee and dependent with the following features:
1. Entity type: DEPARTMENT
Attributes: DName, DNumber, DLocations, DManager
Key attributes: DName and DNumber
Multivalued attribute: DLocations
2. Entity type: PROJECT
Attributes: PName, PNumber, PLocation and Pmanager
Key attribute: PName, PNumber
3. Entity type: EMPLOYEE
Attributes: eName, eSN, eSex, eAddress, eSalary,eBirthdate, Department and supervisor.
Address may be a composite attribute.
4. Entity type: DEPENDENT
Attributes: Empployee, dpid, DependentName, Sex, Birthdate and Relationship to the
employee.
Since each employee may work on different projects, number of hours worked on each project
may be represented by a multivalued composite attribute of employee called WorksOn, with
simple components, Project and Hours (Project, Hours)}.

3.3. ENTITY AND ATTRIBUTES


Entity
The E-R model describes data in terms of entities, relationships and attributes. An entity is a
thing in the real world with an independent existence. An entity may be an object with a physical
existence, such as a person, a car, a building or it may be an object with conceptual existence
such as a course, a company or a job.
Entity:

STUDENT

Attributes: Name

Age

Value:

Rogers

20

Address
12, Kicukiro

COMPANY

Name
MTN

H.Q.

President

Kagali

Rogers

Figure 3.2 Entity, Attribute, Value


ATTRIBUTE
An attribute is a property that describes an entity. Ex. The attributes of a person entity are his
name, address, job, salary, etc. For an entity, each attribute will have a value.
Consider 2 entities, student and an employee.
For student entity, the attributes are register number, student name, age, and address.
STUDENT (Regno, Name, Age, Address)
For an employee entity, attributes may be employee id, name, department and salary.
EMPLOYEE (Id, Name, Dept, Salary)
Similarly for a company entity the attributes may be name, headquarters and president of the
company.
COMPANY (Name, HQ, President)
Types of Attributes

There are different types of attributes in E-R model, namely:


Simple
Composite
Single-valued
Multi-valued
Stored
Derived
Null values
(i)

Simple attribute (Atomic attribute)

Attributes that cannot be further subdivided are called simple or atomic attributes.
Consider the address attribute of student entity.
For example Regno, Deptno, CourseID are three attributes which cannot be subdivided for
further divided and are called simple attributes or Atomic attributes.
(ii)

Composite attribute

A composite attribute can be divided into smaller subparts, which represent more basic attributes
with independent meanings. Composite attributes are used, only if there is a need to refer to the
attribute as a whole, as well as part by part.
For example the Address attribute can be sub-divided into House-number, Street_Name, Area,
City, and Country.
Another example would be the Name attribute which can be composed of FirstName,
MiddleName and LastName.
(iii) Single valued attribute
An attribute that can take only one value at a time is called Single valued attribute. Usually, for a
particular entity, each attribute will have a single value.
For example the Age attribute will have a single value.

(iv) Multi valued attribute


Some attributes may have more than one value for the same attribute. These attributes are called
multi-valued attributes.
For example, a University degree attribute can have multiple values. A person may be
possessing one or two degrees or a person may not have any degree at all. So different persons
can have different number of values for the degree attribute and hence it is a multi-valued
attribute.
Degree {BIT, MBA, Msc.IT, PhD}
Another example is DeptLocation. A department may be located in more than one place in
which case it is said to be multi-valued.
DeptLocation {KIGALI, RUBAVU}
Some cars may have two or more colors. Then color is considered as multi-valued attribute.
Carcoolor {Red, Black}
(v)

Stored attribute

The value of certain attributes cannot be obtained or derived from some other attributes. They
cannot be derived. Such attributes are said to be stored attributes.
Two or more attributes may sometimes be related. For example, age and birthdate are two
attributes that are related. If birthdate is known, with the help of current todays date, age can be
determined.
Birthdate is then said to be stored attribute.
(vi) Derived attribute
If the value of an attribute can be derived from some other attributes, then such attributes are
called derived attributes. Since age can be determined by knowing birthdate and current date, the
age attribute is said to be derived attribute.

Another example is GrossPay of an employee. This can be derived by knowing BasicPay,


allowances and deductions for an employee.
(vii) Null Values
Sometimes, for an entity, a particular attribute may not have any applicable value. For such
attributes, null value is assigned. Null value is a special value used in database applications. Null
value can be assigned to the attribute in two cases:
1.

A particular attribute is not applicable at all for the entity. Ex. For undergraduate
students, the attribute degree is not applicable at all. In that case, a null value may be
assigned to that particular attribute.

2.

The value of the attribute is not known, but it is applicable. Ex. If the age of a particular
student is not known, then also a null value can be assigned to it.

For example PhoneNo attribute may contain null value in an employee entity set as some
employees may not have telephone number.
Also email attribute may contain null value in an employee entity set as some of the employees
may not have an email id.
3.4. RELATIONSHIP
3.4.1. Relationship Definition
A relationship is an association among two or more entities. A relationship captures how two or
more entities are related to one another. Relationships can be thought of as verbs, kinking two or
more nouns.
Examples:
(i) An owns relationship between a company and a computer
(ii) a supervises relationship between an employee and a department
(iii)a performs relationship between an artist and a song
(iv) a proved relationship between a mathematician and a theorem

Relationships are represented as diamonds, connected by lines to each of the entities in the
relationship in an ER diagram.
3.4.2. Relationship Types, Sets and Instances
Relationship Types
A relationship type R among n entity types E1, E2,..defines a set of associations among
entities from these types.
Example: Relationship type work-for between two entities of company database, namely
EMPLOYEE and DEPARTMENT.
3.4.3. Relationship between Entity
The relationship between entity sets are of four types. They are:
(i) One-to-many (1:M),
(ii) Many-to-one (M:1),

Many-to-many (M:N),

(iii)One-to-one (1:1).
One-to-one relationship (1:1 relationship)
The 1:1 relationship between entity sets E1 and E2 indicates that for each entity in either set
there is at most one entity in the second set that is associated with it.
Example: Relationship between Department and Manager.
Department

Manager
Figure 3.5 One-to-one relationship

The relationship between a Department and a Manager is usually one-to-one; there is only one
manager per department and a manager manages only one department. There could be times
when a department is without a manager or when an employee who is classified as a manager
may be without a department to manage. It does not imply that for an occurrence of an entity
from one set at any time there must be an occurrence of an entity in the other set.
One-to-many relationship (1:M relationship)
The 1:M relationship from entity set E1 to E2 indicates that for an occurrence of the entity from
the set E1, there could be zero, one, or more entities from the entity set E2 associated with it.
Each entity in E2 is associated with at most one entity in the entity set E1.
Example: Relationship between Manager and Employee.

Figure 3.6 One to many Relationship

Manager

Employee

A 1:M relationship exists from the entity Manager to the entity Employee because there are
several employees reporting to the manager. There could be an occurrence of the entity type
Manager having zero occurrences of the entity type Employee reporting to him or her.
Many to One Relationship (M:1)

Employee

Manager

Figure 3.7 Many-to-one relationship

A reserve relationship, from Employee to manager, would be many-to-one, since many


employees may be supported by a single manager.
Many-to-Many relationship (M:N relationship)
In the M:N relationship between entity sets E1 and E2, there is no restriction as to the number of
entities in one set associated with an entity in the other set. The database structure, employing the
E-R model is usually shown pictorially using entity-relationship (E_R diagram).
Example: Relationship between employee and Project.

Employee

Project
Figure 3.8 Many to Many relationship

Each employee could be involved in a number of different projects, and a number of employees
could be working on a given project.
TEACHER
JB
PRINCE
OLIVIER
JOHN

TEACHES
r1
r2
r3
r4
r5
r6

CLASS
I BBIT
II BBIT
III BBIT

Figure 3.9 M:N relationship Teaches Relationship set


3.5. ER DIAGRAM NOTATION
An entity-relationship diagram is a graphical depiction of organizational system elements and the
association among the elements. E-R diagrams can help define system boundaries. The elements
that make up a system are referred to as entities. A relationship is the association that describes
the interaction between entities.
An E-R diagram may also indicate the cardinality of a relationship. Cardinality is the number
of instances of one entity that can, or must, be associated with each instance of another entity.
In general we may speak of one-to-one, one-to-many, or many-to-many relationships.
The E-R notation also allows for distinguishing different types of entities. An entity is an object
or concept about which some information is stored. A plain rectangle is used for what is termed a
fundamental entity, that is, an entity that is a real thing (person, place, or thing). The term
associative entity is used for something that is created that joins two entities (for example, a
receipt that exists when a customer and salesperson complete a transaction). And, the term
attributive entity is used for data that is dependent upon a fundamental entity and is useful for
describing attributes (for example, to identify a specific copy of a movie title when a video store
has multiple copies of each movie).

Associative

Fundamental
Entity

Attribute

ENTITY

Weak
entity

Entity types that do not have key attributes of their own are called weak entity types.
Attribute

Attribute is a property or characteristic of an entity

Key

It is an attribute that uniquely identifies an entity. It is the unique distinguishing characteristic of


the entity. For example, regno is the key attribute for student. Employee code is the key attribute
for employee.
Multivalued
attribute
A multivalued attribute can have more than one value. UniversityDegree, DeptLocation,
Carcolor are examples of multivalued attributes.

Derived
attribute

A derived attribute is based on another attribute. For example, age is a derived attribute that can
be derived from dateofbirth and current date.

Relationsh

Relationship specifies how two entities share information in the database.


Cardinality is shown by a series of tick marks and crows feet superimposed on the
relationship lines.
Entity

Relationship

0 or 1

1 or more

0,1, or more

many

In the following example each student fills one seat in a class. Each seat is filled by one student.
(In this usage a seat implies not only a physical place to sit but also a specific day and time).
This is one-to-one relationship.

Student
Seat
Fills teach several courses. Each course has only one
In the next example a single instructor may
instructor.
This is a one-to-many relationship.

Instructor

Teache

Course

As shown below, a single student may register for several courses. A single course can have
many students enrolled in it. This is the many-to-many relationship.

Student

Takes

Course

The next example shows a relationship in which it is possible that no instances exist. Each
professor may teach several course sections but may not teach at all. Assume there is no team
teaching; therefore each section must have a single professor.
Professor

Teache

Section

Finally, a more complex example which shows more than one relationship. All of the examples
above depict single relationships. An actual E-R diagram would show the many entities and
relationships that exist within a system. Here each department offers at least one course; there is
no cross-listing of courses with other departments. Each course must have at least one section but
often has several sections.

Professor

Teache

Section

Each relationship instance in WORKS FOR associates one employee with one department
entity. This is shown in the figure, where relationship instances e1, e2 work for Department d1,
relationship instances e2, e6 and e7 work for department d3. - _
Proper Naming of Schema Constructs
While designing a database schema, names for entity types, attributes, relationship types and
roles should be chosen such that they convey meanings attached to the different constructs in the
schema, as much as possible.
Entity type names are chosen to be singular. Entity type and relationship type names are in
uppercase letters.
In a narrative description of the database requirements, nouns give rise to entity type names
and verbs tend to indicate names of relationship types.
Attribute names arise from additional nouns describing the entity nouns. Binary relationship
names are listed left to right or bottom to top in an ER diagram.
The Entity-Relationship diagram for the relation works for can be shown as follows:

SSN
Salary

Salary

Addres

EMPLOYEE

Departme

Relationship E-R Diagram for the Relationship Supply

Code

Numb

VENDOR

Address

3.5 Schema based Constraints


The schema based constraints are discussed in details as they are used in the relational model.
They are:
Domain constraints
Key constraints
Constraints on nulls
Entity integrity constraints
Referential integrity constraints
Domain Constraints
Within a tuple, the value of each attribute A must be an atomic value from the domain dom(A).

The data types include standard numeric data types for integers and real numbers. There are also
characters, Booleans, fixed-length strings and variable length strings, date, time, timestamp and
so on.
Other possible domains may be described by a subrange of values from a data type or as an
enumerated data type in which all possible values are explicitly listed.
Key constraints
All tuples inn a relation must be distinct. No two tuples have the same combination of values for
all their attributes
A super key SK specifies a uniqueness constraint that no two distinct tuples in any state r of
relation R can have the same value for SK. Every relation has at least one default super key
which is the set of all its attributes. A super key can have redundant attributes.
A key K of a relation schema R is a super key of R with the additional property that removing
any attribute A from K leaves another set of attributes K that is not a super key of R any more.
A key satisfies two constraints.
i. Two distinct tuples in any state of the relation cannot have identical values for all the
attributes in the key.
ii. It is a minimal super key, i.e. a super key from which we cannot remove any attributes
and still have the uniqueness constraint hold.
Key constraint is a constraint that should hold on every valid relation state or the schema. The
value of a key attribute can be used to identify uniquely each tuple in the relation. A relation is a
set of tuples, where each tuple is distinct i.e. no two tuples can have the same values for all the
attributes. Usually there will be at least one set of attributes, for which two tuples will not have
same combinations of values and such a set of attributes is called the key of the relation schema
R.
In the student relation, REGNO can be considered as a key because no two tuples can have the
same value for REGNO. This key can be used to identify the tuples, since it is unique. This key
should maintain its uniqueness property, even when some more tuples are added to the relation,

i.e., the property is time invariant. Fields such as name should not be declared as keys, because
name may not be unique.
Sometimes a relation schema may have more than one key. Each of the keys is called a
candidate key. One of the candidate keys can be chosen to be the primary key which is used to
identify tuples in the relation. It is better to choose a primary key with a single attribute or a
small number of attributes.
Constraints on Null Values
Another constraint on attributes specifies whether null values are not permitted. The constraint is
specified as NOT NULL.
Integrity constraints are specified on a database schema and are expected to hold on every
valid database state of that schema.
The constraints that are part of relational model are
Entity integrity constraint
Referential integrity constraint
Entity Integrity
It states that no primary key value can be null. This is because this value only identifies the
individual tuples distinctly in a relation. Null values in two or more tuples means they cannot be
identified and distinguished.
Key constraints and entity integrity constraints are specified on individual relations.
Referential integrity
It is specified between two relations and is used to maintain the consistency among tuples in the
two relations.
It states that a tuple in one relation that refers to another relation must refer to an existing tuple in
that relation.

Example:
DNO of EMPPLOYEE gives the department number for which each employee works. This
should match with the DNUMBER value of some tuple in the DEPARTMENT relation.
EMPLOYEE-SSN, NAME, BDATE, ADDRESS, SALARY, SUPERSSN, DNO
DEPARTMENT-DNUMBER, DNAME, MGRSSN, MGRSTARTDATE
To define referential integrity constraint more clearly, we should know what a foreign key is.
Foreign key
A set of attributes FK in relation schema R1 is a foreign key of r1 that references relation R2 if it
satisfies the following two rules.
i. The attributes in FK have the same domains (S) as the primary key attributes PK of R2.
ii. A value of FK in a tuple t1 of the current state r1 (R1) either occurs as a value of PK for
some tuple t2 in the current state r2 (R2) or is null.
R1 is called the referencing relation. R2 is called the referenced relation. If these two conditions
hold, the referential integrity constraint from R1 to R2 is said to hold.
Referential integrity constraints typically arise from the relationships among the entities
represented by the relation schemas.
In the EMPLOYEE relation, the attribute DNO refers to the department for which an employee
works, hence we call DNO a foreign key of EMPLOYEE, referring to the DEPARTMENT
relation. This means that a value of DNO in any tuple t1 of the EMPLOYEE relation must match
a value of the primary key of DEPARTMENTThe DNUMBER attribute in some tuple t2 of
the DEPARTMENT relation, or the value of DNO can be null, if the employee does not belong
to any department.
Sometimes a foreign key can refer to its own relation, the attribute SUPERSSN of EMPLOYEE
relation, indicating the Social Number of the supervisor under whom the employee works, the
supervisor himself being in the same relation EMPLOYEE. EX. SUPERSSN is a foreign key
that references the EMPLOYEE relation itself.

Most relational DBMSs support key and entity integrity constraints and also support referential
integrity.
Semantic Integrity constraints
These may have to be specified and enforced on a relational database. Such constraints can be
specified and enforced within the application programs that update the database or by using a
general purpose constraint specification language.
Mechanisms such as Triggers and Assertions can be used. However, it is more common to check
for these types of constraints within the application programs than to use constraint specification
languages because they are difficult and complex to use correctly.
Example: The salary of an employee should not exceed the salary of the employees supervisor.
Maximum number of hours an employee can work on all projects is 56
Functional dependency constraint
This establishes a functional relationship among two sets of attributes X and Y. This constraint
specifies that the value of X determines the value of y in all states of a relation. It is denoted as
XY. This can be used as a tool to analyze the quality of relational designs and to normalize
relations to improve their quality.

REVIEW AND SELF ASSESSMENT QUESTIONS


1.

Explain the differences between Logical design and Physical design.

2.

Define the terms. Entity, Attribute,.

3.

Explain various types of attributes with examples

4.

Differentiate between simple and composite attributes.

5.

Differentiate between single-valued and multi-valued attributes.

6.

Differentiate between stored and derived attributes.

7.

Explain Null value of an attribute.

8.

What is a key attribute? Given one example.

9.

What is a composite attribute? Give one example.

10. What do you mean by domain of an attribute? Give one example.


11. Define Relationship.
12. What is Relationship Type? Given one example.

You might also like