[go: up one dir, main page]

0% found this document useful (0 votes)
18 views29 pages

DBMS Unit 1

The document outlines the curriculum for a B.Tech course in Database Management Systems, covering key topics such as database design, relational models, SQL, transactions, and data storage. It discusses the applications and history of database systems, highlighting their advantages over traditional file systems, including improved data sharing, integrity, and security. Additionally, it addresses the complexities and disadvantages of DBMS, as well as various data models like the Entity-Relationship and Relational models.

Uploaded by

daranavya2904
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views29 pages

DBMS Unit 1

The document outlines the curriculum for a B.Tech course in Database Management Systems, covering key topics such as database design, relational models, SQL, transactions, and data storage. It discusses the applications and history of database systems, highlighting their advantages over traditional file systems, including improved data sharing, integrity, and security. Additionally, it addresses the complexities and disadvantages of DBMS, as well as various data models like the Entity-Relationship and Relational models.

Uploaded by

daranavya2904
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

B.

Tech
2 YEAR 2 SEM

DATABASE MANAGEMENT SYSTEMS


Unit-1
 Overview of Database Systems
 Introduction to Database Design

Unit-2
 Relational Model
 Relational Algebra and Calculus

Unit-3
 SQL: Queries, Constraints, Triggers
 Schema Refinement & Normal Forms

Unit-4

 Transactions
 Concurrency Control
 Recovery System

Unit-5 Dr.V.SAMBASIVA RAO


 Overview of Storage and Indexing St.MARY’S ENGINEERING COLLEGE
 Tree-Structured Indexing HYDERABAD
UNIT-1
CHAPTER-1

Introduction
A database-management system (DBMS) is a collection of interrelated data and a set of
programs to access those data. The collection of data, usually referred to as the database, contains
information relevant to an enterprise. The primary goal of a DBMS is to provide a way to store and
retrieve database information that is both convenient and efficient. Data means known facts that can be
recorded and that have implicit meaning.
Database systems are designed to manage large bodies of information. Management of data
involves both defining structures for storage of information and providing mechanisms for the
manipulation of information. In addition, the database system must ensure the safety of the information
stored, despite system crashes or attempts at unauthorized access. If data is to be shared among several
users, the system must avoid possible anomalous results.
Because information is so important in most organizations, computer scientists have developed
a large body of concepts and techniques for managing data.

1.1 Database System Applications


A Database management system is a computerized record-keeping system. It is a repository or
a container for collection of computerized data files. The overall purpose of DBMS is to allow the
users to define, store, retrieve and update the information contained in the database on demand.
Information can be anything that is of significance to an individual or organization. Databases touch all
aspects of our lives, they are widely used. Some of the major areas of application are as follows
• Banking: For customer information, accounts, and loans, and banking transactions.
• Airlines: For reservations and schedule information. Airlines were among the first to use databases in
a geographically distributed manner - terminals situated around the world accessed the
central database system through phone lines and other data networks.
• Universities: For student information, course registrations, and grades.
• Credit card transactions: For purchases on credit cards and generation of monthly statements.
• Telecommunication: For keeping records of calls made, generating monthly bills, maintaining
balances on prepaid calling cards, and storing information about the communication networks.
• Finance: For storing information about holdings, sales, and purchases of financial instruments such
as stocks and bonds.
• Sales: For customer, product, and purchase information.
• Manufacturing: For management of supply chain and for tracking production of items in factories,
inventories of items in warehouses/stores, and orders for items.
• Human resources: For information about employees, salaries, payroll taxes and benefits, and for
generation of paychecks.
As the list illustrates, databases form an essential part of almost all enterprises today. The
internet revolution of the late 1990s sharply increased direct user access to databases. For instance,

1
Dr.V.Sambasiva Rao, St.Mary’s Engineering College, Hyderabad
when you access an online bookstore and browse a book or music collection, you are accessing data
stored in a database. When you enter an order online, your order is stored in a database. When you
access a bank Web site and retrieve your bank balance and transaction information, the information is
retrieved from the bank’s database system.

1.2 History of Database Systems


Data processing drives the growth of computers. Automation of data processing tasks predates
computers. Punched cards, invented by Hollerith, were used at the very beginning of the twentieth
century to record U.S. census data, and mechanical systems were used to process the cards and tabulate
results. Punched cards were later widely used as a means of entering data into computers.
Techniques for data storage and processing have evolved over the years:
 1950s and early 1960s: Magnetic tapes were developed for data storage. Data processing tasks
such as payroll were automated, with data stored on tapes. Processing of data consisted of reading
data from one or more tapes and writing data to a new tape. Data could also be input from punched
card decks, and output to printers. Tapes (and card decks) could be read only sequentially, and data
sizes were much larger than main memory; thus, data processing programs were forced to process
data in a particular order, by reading and merging data from tapes and card decks.
 Late 1960s and 1970s: Widespread use of hard disks in the late 1960s changed the scenario for
data processing greatly, since hard disks allowed direct access to data. The position of data on disk
was immaterial, since any location on disk could be accessed in just tens of milliseconds. Data
were thus freed from the tyranny of sequentiality. With disks, network and hierarchical databases
could be created that allowed data structures such as lists and trees to be stored on disk.
Programmers could construct and manipulate these data structures.
A landmark paper by Codd [1970] defined the relational model, and nonprocedural ways of
querying data in the relational model, and relational databases were born. The simplicity of the
relational model and the possibility of hiding implementation details completely from the
programmer were enticing indeed.
 1980s: Although academically interesting, the relational model was not used in practice initially,
because of its perceived performance disadvantages; relational databases could not match the
performance of existing network and hierarchical databases. That changed with System R, a
ground-breaking project at IBM Research that developed techniques for the construction of an
efficient relational database system. The fully functional System R prototype led to IBM’s first
relational database product, SQL/DS. Initial commercial relational database systems, such as IBM
DB2, Oracle, Ingres, and DEC Rdb, played a major role in advancing techniques for efficient
processing of declarative queries. By the early 1980s, relational databases had become competitive
with network and hierarchical database systems even in the area of performance. Relational
databases were so easy to use that they eventually replaced network/hierarchical databases;
The 1980s also saw much research on parallel and distributed databases, as well as initial work
on object-oriented databases.

2
Dr.V.Sambasiva Rao, St.Mary’s Engineering College, Hyderabad
 Early 1990s: The SQL language was designed primarily for decision support applications, which
are query intensive, yet the mainstay of databases in the 1980s was transaction processing
applications, which are update intensive. Decision support and querying re-emerged as a major
application area for databases. Tools for analyzing large amounts of data saw large growths in
usage.
Many database vendors introduced parallel database products in this period. Database vendors
also began to add object-relational support to their databases.
 Late 1990s: The major event was the explosive growth of the Worldwide Web. Databases were
deployed much more extensively than ever before. Database systems now had to support very high
transaction processing rates, as well as very high reliability and 24×7 availability (availability 24
hours a day, 7 days a week, meaning no downtime for scheduled maintenance activities). Database
systems also had to support Web interfaces to data.

1.3 Database Systems versus File Systems


One way to keep the information on a computer is to store it in operating system files. To allow
users to manipulate the information, the system has a number of application programs that manipulate
the files, including
• A program to modify the data
• A program to add a new data
• A program to retrieve the data
• A program to generate monthly statements
System programmers wrote these application programs to meet the needs of the customers.
New application programs are added to the system as the need arises. Thus, as time goes by, the
system acquires more files and more application programs.
This typical file-processing system is supported by a conventional operating system. The
system stores permanent records in various files, and it needs different application programs to extract
records from, and add records to, the appropriate files. Before database management systems (DBMSs)
came along, organizations usually stored information in such systems.
Keeping organizational information in a file-processing system has a number of major disadvantages
• Data redundancy and inconsistency. Since different programmers create the files and application
programs over a long period, the various files are likely to have different formats and the programs
may be written in several programming languages. Moreover, the same information may be duplicated
in several places (files). This redundancy leads to higher storage and access cost. In addition, it may
lead to data inconsistency; that is, the various copies of the same data may no longer agree. For
example, a changed customer address may be reflected in savings-account records but not elsewhere in
the system.
• Difficulty in accessing data. The conventional file-processing environments do not allow needed
data to be retrieved in a convenient and efficient manner. More responsive data-retrieval systems are
required for general use.

3
Dr.V.Sambasiva Rao, St.Mary’s Engineering College, Hyderabad
• Data isolation. Because data are scattered in various files, and files may be in different formats,
writing new application programs to retrieve the appropriate data is difficult.
• Integrity problems. The data values stored in the database must satisfy certain types of consistency
constraints. For example, the balance of a bank account may never fall below a prescribed amount
(say, $25). Developers enforce these constraints in the system by adding appropriate code in the
various application programs. However, when new constraints are added, it is difficult to change the
programs to enforce them. The problem is compounded when constraints involve several data items
from different files.
• Atomicity problems. A computer system, like any other mechanical or electrical device, is subject to
failure. In many applications, it is crucial that, if a failure occurs, the data be restored to the consistent
state that existed prior to the failure. Consider a program to transfer $50 from account A to account B.
If a system failure occurs during the execution of the program, it is possible that the $50 was removed
from account A but was not credited to account B, resulting in an inconsistent database state. Clearly, it
is essential to database consistency that either both the credit and debit occur, or that neither occur.
That is, the funds transfer must be atomic—it must happen in its entirety or not at all. It is difficult to
ensure atomicity in a conventional file-processing system.
• Concurrent-access anomalies. For the sake of overall performance of the system and faster
response, many systems allow multiple users to update the data simultaneously. In such an
environment, interaction of concurrent updates may result in inconsistent data. To guard against this
possibility, the system must maintain some form of supervision. But supervision is difficult to provide
because data may be accessed by many different application programs that have not been coordinated
previously.
• Security problems. Not every user of the database system should be able to access all the data. For
example, in a banking system, payroll personnel need to see only that part of the database that has
information about the various bank employees. They do not need access to information about customer
accounts. But, since application programs are added to the system in an ad hoc manner, enforcing such
security constraints is difficult.

Purpose of Database Systems


These difficulties, among others, of file-processing system prompted the development of
database systems. The advantages of DBMS are as follows
Controlling of Redundancy: Data redundancy refers to the duplication of data (i.e. storing same data
multiple times). In a database system, by having a centralized database and centralized control of data
by the DBA the unnecessary duplication of data is avoided. It also eliminates the extra time for
processing the large volume of data. It results in saving the storage space. And inconsistency can be
avoided.
Improved Data Sharing: DBMS allows a user to share the data in any number of application
programs.
Data Integrity: Integrity means that the data in the database is accurate. Centralized control of the data
helps in permitting the administrator to define integrity constraints to the data in the database. For

4
Dr.V.Sambasiva Rao, St.Mary’s Engineering College, Hyderabad
example: in customer database we can enforce an integrity constraint, that it must accept the customers
only from Hyderabad and Vijayawada city.
Security: Having complete authority over the operational data, enables the DBA in ensuring that the
only mean of access to the database is through proper channels. The DBA can define authorization
checks to be carried out whenever access to sensitive data is attempted.
Data Consistency: By eliminating data redundancy, we greatly reduce the opportunities for
inconsistency. For example: is a customer address is stored only once, we cannot have disagreement on
the stored values. Also updating data values is greatly simplified when each value is stored in one
place only. Finally, we avoid the wasted storage that results from redundant data storage.
Efficient Data Access: In a database system, the data is managed by the DBMS and all access to the
data is through the DBMS providing a key to effective data processing.
Enforcements of Standards: With the centralized of data, DBA can establish and enforce the data
standards which may include the naming conventions, data quality standards etc.
Data Independence: In a database system, the database management system provides the interface
between the application programs and the data. When changes are made to the data representation, the
meta data obtained by the DBMS is changed but the DBMS is continues to provide the data to
application program in the previously used way. The DBMs handles the task of transformation of data
wherever necessary.
Data Administration: When several users share the data, centralizing the administration of data can
offer sig11ificant improvements.
Concurrent Access and Crash Recovery: A DBMS schedules concurrent accesses to the data in such
a manner that users can think of the data as being accessed by only one user at a time. Further, the
DBMS protects users from the effects of system failures.
Reduced Application Development and Maintenance Time: DBMS supports many important
functions that are common to many applications, accessing data stored in the DBMS, which facilitates
the quick development of application.

Disadvantages of DBMS
1) It is bit complex. The designers and developers should have thorough knowledge about the
software to get the most out of it.
2) Because of its complexity and functionality, it uses large amount of memory. It also needs large
memory to run efficiently.
3) DBMS system works on the centralized system, i.e. all the users from all over the world access
this database. Hence any failure of the DBMS, will impact all the users.
4) DBMS is generalized software, i.e. it is written work on the entire systems rather specific one.
Hence some of the application will run slow.

1.4 Data Models


Underlying the structure of a database is the data model: a collection of conceptual tools for
describing data, data relationships, data semantics, and consistency constraints. The data models can be
classified into following categories:
5
Dr.V.Sambasiva Rao, St.Mary’s Engineering College, Hyderabad
 Entity-Relationship Model: The entity-relationship (E-R) data model is based on a perception of a
real world that consists of a collection of basic objects, called entities, and of relationships among
these objects. An entity is a “thing” or “object” in the real world that is distinguishable from other
objects. Entities are described in a database by a set of attributes. A relationship is an association
among several entities. The set of all entities of the same type and the set of all relationships of the
same type are termed an entity set and relationship set, respectively. It is a semantic data model,
which is a more abstract, high-level data model that makes it easier for a user to come up with a good
initial description of the data in an enterprise. These models contain a wide variety of constructs that
help describe a real application scenario. The overall logical structure (schema) of a database can be
expressed graphically by an E-R diagram, which is built up from the following components:
 Rectangles, which represent entity sets
 Ellipses, which represent attributes
 Diamonds, which represent relationships among entity sets
 Lines, which link attributes to entity sets and entity sets to relationships
Each component is labeled with the entity or relationship that it represents. In addition to
entities and relationships, the E-R model represents certain constraints to which the contents of a
database must conform. One important constraint is mapping cardinalities, which express the
number of entities to which another entity can be associated via a relationship set. The entity-
relationship model is widely used in database design.

 Relational Model: The relational model uses a collection of tables to represent both data and the
relationships among those data. Each table has multiple columns, and each column has a unique
name. Tables are also known as relations. The relational model is an example of a record-based
model. Record-based models are so named because the database is structured in fixed-format records
of several types. Each table contains records of a particular type. Each record type defines a fixed
number of fields, or attributes. The columns of the table correspond to the attributes of the record
type. The relational data model is the most widely used data model, and a vast majority of current
database systems are based on the relational model.

 Object-Based Data Model: Object-oriented programming (especially in Java, C++, or C#) has
become the dominant software-development methodology. This led to the development of an object-
oriented data model that can be seen as extending the E-R model with notions of encapsulation,
methods (functions), and object identity. The object-relational data model combines feature of the
object-oriented data model and relational data model.

 Semi-structured Data Model: The semi-structured data model permits the specification of data
where individual data items of the same type may have different sets of attributes. This is in contrast
to the data models mentioned earlier, where every data item of a particular type must have the same
set of attributes. The Extensible Markup Language (XML) is widely used to represent semi-
structured data.

6
Dr.V.Sambasiva Rao, St.Mary’s Engineering College, Hyderabad
 Historically, the network data model and the hierarchical data model preceded the relational data
model. These models were tied closely to the underlying implementation, and complicated the task
of modeling data. As a result, they are used little now, except in old database code that is still in
service in some places.

1.5 Levels of Abstraction in a DBMS


A database system is a collection of interrelated files and a set of programs that allow users to access
and modify these files. A major purpose of a database system is to provide users with an abstract view
of the data.
For the system to be usable, it must retrieve data efficiently. The need for efficiency has led
designers to use complex data structures to represent data in the database. Since many database-
systems users are not computer trained, developers hide the complexity from users through several
levels of abstraction, to simplify users’ interactions with the system. The data in a DBMS is described
at three levels of abstraction: the physical, conceptual, and external.

 Physical Level: The physical schema specifies additional storage details. Essentially, the physical
schema summarizes how the relations described in the conceptual schema are actually stored on
secondary storage devices such as disks and tapes. We must decide what file organizations to use
to store the relations and create auxiliary data structures, called indexes, to speed up data retrieval
operations.
 A sample physical schema for the university database follows:
• Store all relations as unsorted files of records. (A file in a DBMS is either a collection of
records or a collection of pages, rather than a string of characters as in an operating system)
• Create indexes on the first column of the Students, Faculty, and Courses relations, the sal
column of Faculty, and the capacity column of Rooms.
Decisions about the physical schema are based on an understanding of how the data is typically
accessed. The process of arriving at a good physical schema is called physical database design.

 Conceptual Level: The conceptual schema (sometimes called the logical schema) describes the
stored data in terms of the data model of the DBMS. In a relational DBMS, the conceptual schema
describes all relations that are stored in the database. It describes what data are stored in the
database, and what relationships exist among those data. The logical level thus describes the entire
database in terms of a small number of relatively simple structures. Although implementation of
the simple structures at the logical level may involve complex physical-level structures, the user of
the logical level does not need to be aware of this complexity. Database administrators, who must
decide what information to keep in the database, use the logical level of abstraction.
In a sample university database, the relations contain information about entities, such as students
and faculty, and about relationships, such as students' enrollment in courses. All student entities
can be described using records in a Students relation. In fact, each collection of entities and each
collection of relationships can be described as a relation, leading to the following conceptual
schema:

7
Dr.V.Sambasiva Rao, St.Mary’s Engineering College, Hyderabad
Students(sid: string, name: string, login: string, age: integer, gpa: real)
Faculty(fid: string, fname: string, sal: real)
Courses( cid: string, cname: string, credits: integer)
Rooms(rno: integer, address: string, capacity: integer)
Enrolled (sid: string, cid: string, grade: string)
Teaches(fid: string, cid: string)
Meets_In( cid: string, rno: integer, time: string)

 External level (View Level): The highest level of abstraction describes only part of the entire
database. Any given database has exactly one conceptual schema and one physical schema because
it has just one set of stored relations, but it may have several external schemas, each tailored to a
particular group of users. Each external schema consists of a collection of one or more views and
relations from the conceptual schema. Even though the logical level uses simpler structures,
complexity remains because of the variety of information stored in a large database. Many users of
the database system do not need all this information; instead, they need to access only a part of the
database. The view level of abstraction exists to simplify their interaction with the system. The
system may provide many views for the same database.

Ex:2
Most high-level programming languages support the notion of a record type. For example, in a Pascal-
like language, we may declare a record as follows:
type customer = record
customer-id : string;
customer-name : string;
customer-street : string;
customer-city : string;
end;

8
Dr.V.Sambasiva Rao, St.Mary’s Engineering College, Hyderabad
This code defines a new record type called customer with four fields. Each field has a name and a type
associated with it.

A banking enterprise may have several such record types, including


• account, with fields account-number and balance
• employee, with fields employee-name and salary
At the physical level, a customer, account, or employee record can be described as a block of
consecutive storage locations (for example, words or bytes). The language compiler hides this level of
detail from programmers. Similarly, the database system hides many of the lowest-level storage details
from database programmers. Database administrators, on the other hand, may be aware of certain
details of the physical organization of the data.
At the logical level, each such record is described by a type definition, as in the previous code
segment, and the interrelationship of these record types is defined as well. Programmers using a
programming language work at this level of abstraction. Similarly, database administrators usually
work at this level of abstraction.
Finally, at the view level, computer users see a set of application programs that hide details of
the data types. Similarly, at the view level, several views of the database are defined, and database
users see these views. In addition to hiding details of the logical level of the database, the views also
provide a security mechanism to prevent users from accessing certain parts of the database. For
example, tellers in a bank see only that part of the database that has information on customer accounts;
they cannot access information about salaries of employees.

1.6 Instances and Schemas


Databases change over time as information is inserted and deleted. The collection of
information stored in the database at a particular moment is called an instance of the database. The
overall design of the database is called the database schema. Schemas are changed infrequently.
Database systems have several schemas, partitioned according to the levels of abstraction. The
physical schema describes the database design at the physical level, while the logical schema
describes the database design at the logical level. A database may also have several schemas at the
view level, sometimes called sub-schemas, that describe different views of the database.
Of these, the logical schema is by far the most important, in terms of its effect on application
programs, since programmers construct applications by using the logical schema. The physical schema
is hidden beneath the logical schema, and can usually be changed easily without affecting application
programs.

1.7 Data Independence


A very important advantage of using a DBMS is that it offers data independence. That is, application
programs are insulated from changes in the way the data is structured and stored. Data independence is
achieved through use of the three levels of data abstraction;
The conceptual schema insulates users from changes in physical storage details. This property
is referred to as physical data independence. The conceptual schema hides details such as how the data
9
Dr.V.Sambasiva Rao, St.Mary’s Engineering College, Hyderabad
is actually laid out on disk, the file structure, and the choice of indexes. As long as the conceptual
schema remains the same, we can change these storage details without altering applications.
Application programs are said to exhibit physical data independence if they do not depend on the
physical schema, and thus need not be rewritten if the physical schema changes.
Users can be shielded from changes in the logical structure of the data, or changes in the choice
of relations to be stored. This property is called logical data independence. Application programs are
said to exhibit logical data independence if they do not depend on the logical schema, and thus need
not be rewritten if the logical schema changes.

1.8 Database Languages


A database system provides a data definition language to specify the database schema and a
data manipulation language to express database queries and updates. In practice, the data definition
and data manipulation languages are not two separate languages; instead they simply form parts of a
single database language, such as the widely used SQL language.

 Data-Definition Language: We specify a database schema by a set of definitions expressed by a


special language called a data-definition language (DDL). We specify the storage structure and
access methods used by the database system by a set of statements in a special type of DDL called a
data storage and definition language. These statements define the implementation details of the
database schemas, which are usually hidden from the users. The data values stored in the database
must satisfy certain consistency constraints. The DDL provides facilities to specify such
constraints. The database systems check these constraints every time the database is updated.
For example, the following statement in the SQL language defines the account table:
 create table account (account-number char(10), balance integer);
In addition, it updates a special set of tables called the data dictionary or data directory. A data
dictionary contains metadata—that is, data about data. The schema of a table is an example of
metadata. A database system consults the data dictionary before reading or modifying actual data.

 Data-Manipulation Language: Data manipulation is

 The retrieval of information stored in the database


 The insertion of new information into the database
 The deletion of information from the database
 The modification of information stored in the database

A data-manipulation language (DML) is a language that enables users to access or manipulate data
as organized by the appropriate data model. There are basically two types:
 Procedural DMLs require a user to specify what data are needed and how to get those data.
 Declarative DMLs (also referred to as non-procedural DMLs) require a user to specify
what data are needed without specifying how to get those data. Declarative DMLs are
usually easier to learn and use than are procedural DMLs.

10
Dr.V.Sambasiva Rao, St.Mary’s Engineering College, Hyderabad
However, since a user does not have to specify how to get the data, the database system has to
figure out an efficient means of accessing data. The DML component of the SQL language is
nonprocedural.
A query is a statement requesting the retrieval of information. The portion of a DML that
involves information retrieval is called a query language. Although technically incorrect, it is
common practice to use the terms query language and data manipulation language synonymously.
This query in the SQL language finds the name of the customer whose customer-id is 192-83-7465:
select customer.customer-name
from customer
where customer.customer-id = 192-83-7465
The query specifies that those rows from the table customer where the customer-id is 192-83-
7465 must be retrieved, and the customer-name attribute of these rows must be displayed. Queries
may involve information from more than one table.
The levels of abstraction can not only to defining or structuring data, but also to manipulating
data. At the physical level, we must define algorithms that allow efficient access to data. At higher
levels of abstraction, we emphasize ease of use. The goal is to allow humans to interact efficiently
with the system. The query processor component of the database system translates DML queries into
sequences of actions at the physical level of the database system.

 Database Access from Application Programs:


Application programs are programs that are used to interact with the database. Application
programs are usually written in a host language, such as Cobol, C, C++, or Java. To access the
database, DML statements need to be executed from the host language.
There are two ways to do this:
 By providing an application program interface (set of procedures) that can be used to send
DML and DDL statements to the database, and retrieve the results.
 By extending the host language syntax to embed DML calls within the host language program.
Usually, a special character prefaces DML calls, and a preprocessor, called the DML pre-
compiler, converts the DML statements to normal procedure calls in the host language.

1.9 Database Users and Administrators


A primary goal of a database system is to retrieve information from and store new information
in the database. People who work with a database can be categorized as four types based on the way
they expect to interact with the system. Different types of user interfaces have been designed for the
different types of users.
 Naive users are unsophisticated users who interact with the system by invoking one of the
application programs that have been written previously. The typical user interface for naive users is
a forms interface, where the user can fill in appropriate fields of the form. Naive users may also
simply read reports generated from the database.
 Application programmers are computer professionals who write application programs.
Application programmers can choose from many tools to develop user interfaces. Rapid
11
Dr.V.Sambasiva Rao, St.Mary’s Engineering College, Hyderabad
application development (RAD) tools are tools that enable an application programmer to
construct forms and reports without writing a program.
 Sophisticated users interact with the system without writing programs. Instead, they form their
requests in a database query language. They submit each such query to a query processor, whose
function is to break down DML statements into instructions that the storage manager understands.
Analysts who submit queries to explore data in the database fall in this category. Online analytical
processing (OLAP) tools simplify analysts’ tasks by letting them view summaries of data in
different ways. Another class of tools for analysts is data mining tools, which help them, find
certain kinds of patterns in data.
 Specialized users are sophisticated users who write specialized database applications that do not
fit into the traditional data-processing framework. Among these applications are computer-aided
design systems, knowledge base and expert systems, systems that store data with complex data
types (for example, graphics data and audio data), and environment-modeling systems.

 Database Administrator
One of the main reasons for using DBMSs is to have central control of both the data and the
programs that access those data. A person who has such central control over the system is called a
Database Administrator (DBA).

The functions of a DBA include:


• Schema definition. The DBA is responsible for interacting with the users of the system to
understand what data is to be stored in the DBMS and how it is likely to be used. Based on this
knowledge, the DBA must design the conceptual schema (decide what relations to store) and the
physical schema (decide how to store them). The DBA creates the original database schema by
executing a set of data definition statements in the DDL.
• Storage structure and access-method definition.
• Schema and physical-organization modification (Database Tuning): The DBA carries out
changes to the schema and physical organization to reflect the changing needs of the organization,
or to alter the physical organization to improve performance.
• Granting of authorization for data access (Security): By granting different types of
authorization, the database administrator can regulate which parts of the database various users can
access. The authorization information is kept in a special system structure that the database system
consults whenever someone attempts to access the data in the system.
• Routine maintenance. Some of the database administrator’s routine maintenance activities are:
 Periodically backing up the database, either onto tapes or onto remote servers, to prevent
loss of data in case of disasters such as flooding.
 Ensuring that enough free disk space is available for normal operations, and upgrading disk
space as required.
 Monitoring jobs running on the database and ensuring that performance is not degraded by
very expensive tasks submitted by some users.

12
Dr.V.Sambasiva Rao, St.Mary’s Engineering College, Hyderabad
1.10 Database System Structure
A database system is partitioned into modules that deal with each of the responsibilities of the
overall system. The functional components of a database system are shown in the following figure.

The DBMS accepts SQL commands generated from a variety of user interfaces, produces query
evaluation plans, executes these plans against the database, and returns the answers.

Figure 1.2 System structure.

13
Dr.V.Sambasiva Rao, St.Mary’s Engineering College, Hyderabad
The query evaluation engine is important because it helps the database system simplify and
facilitate access to data. When a user issues a query, the parsed query is presented to a query optimizer,
which uses information about how the data is stored to produce an efficient execution plan for
evaluating the query. An execution plan is a blueprint for evaluating a query, usually represented as a
tree of relational operators. Relational operators serve as the building blocks for evaluating queries
posed against the data.

File and access methods layer supports the concept of a file, which, in a DBMS, is a
collection of pages or a collection of records. Heap files, or files of unordered pages, as well as indexes
are supported. In addition to keeping track of the pages in a file, this layer organizes the information
within a page. It manages the allocation of space on disk storage and the data structures used to
represent information stored on disk.

Buffer manager brings pages in from disk to main memory as needed in response to read
requests. Corporate databases range in size from hundreds of gigabytes to terabytes of data. Since the
main memory of computers cannot store this much information, the information is stored on disks.
Data are moved between disk storage and main memory as needed. Buffer manager is responsible for
deciding what data to cache in main memory. The buffer manager is a critical part of the database
system, since it enables the database to handle data sizes that are much larger than the size of main
memory.

The disk space manager is the lowest layer of the DBMS software that deals with
management of space on disk, where the data is stored. Higher layers allocate, deallocate, read, and
write pages through (routines provided by) this layer.

The DBMS supports concurrency and crash recovery by carefully scheduling user requests and
maintaining a log of all changes to the database. DBMS components associated with concurrency
control and recovery include Transaction Manager, Lock Manager and Recovery Manager.
The transaction manager, which ensures that the database remains in a consistent (correct)
state despite system failures, and that concurrent transaction executions proceed without conflicting. It
also ensures that transactions request and release locks according to a suitable locking protocol and
schedules the execution transactions;
The lock manager, which keeps track of requests for locks and grants locks on database
objects when they become available;

The recovery manager, which is responsible for maintaining a log and restoring the system to
a consistent state after a crash. The disk space manager, buffer manager, and file and access method
layers must interact with these components.

Data files, which store the database itself.


Index files, which provide fast access to data items that hold particular values.

14
Dr.V.Sambasiva Rao, St.Mary’s Engineering College, Hyderabad
CHAPTER–2

Introduction to database design

The entity-relationship(ER) data model allows us to describe the data involved in a real-world
enterprise in terms of objects and their relationships and is widely used to develop an initial database
design.

2.1 DATABASE DESIGN AND ER DIAGRAMS


The database design process can be divided into six steps. The ER model is most relevant to the first
three steps.
1. Requirements Analysis: The very first step in designing a database application is to understand
what data is to be stored in the database, what applications must be built on top of it, and what
operations are most frequent and subject to performance requirements. This process involves
discussions with user groups, a study of the current operating environment and how it is expected to
change, analysis of any available documentation on existing applications that are expected to be
replaced or complemented by the database, and so on.
2. Conceptual Database Design: The information gathered in the requirements analysis step is used
to develop a high-level description of the data to be stored in the database, along with the constraints
known to hold over this data. This step is often carried out using the ER model. The ER model is
one of several high-level or semantic data models used in database design. The goal is to create a
simple description of the data that closely matches how users and developers think of the data. The
initial design must be sufficiently precise to enable a straightforward translation into a data model
supported by a commercial database system.
3. Logical Database Design: We must choose a DBMS to implement our database design, and
convert the conceptual database design into a database schema in the data model of the chosen
DBMS. The task in the logical design step is to convert an ER schema into a relational database
schema, called the logical schema.

2.1.1 Beyond ER Design


Once we have a good logical schema, we must consider performance criteria and design the
physical schema. Finally, we must address security issues and ensure that users are able to access the
data they need, but not data that we wish to hide from them. The remaining three steps of database
design are briefly described next:
1. Schema Refinement: The fourth step in database design is to analyze the collection of relations
to identify potential problems in relational database schema and to refine it. Schema refinement
can be guided by some elegant and powerful theory called normalization in which relations are
restructured to ensure some desirable properties.
2. Physical Database Design: In this step, we consider typical expected workloads that our
database must support and further refine the database design to ensure that it meets desired

15
Dr.V.Sambasiva Rao, St.Mary’s Engineering College, Hyderabad
performance criteria. This step may simply involve building indexes on some tables and
clustering some tables, or it may involve a substantial redesign of parts of the database schema
obtained from the earlier design steps.
3. Application and Security Design: Any software project that involves a DBMS must consider
aspects of the application that go beyond the database itself. We must identify the entities (e.g.,
users, user groups, departments) and processes involved in the application. We must describe the
role of each entity in every process that is reflected in some application task, as part of a
complete workflow for that task. For each role, we must identify the parts of the database that
must be accessible and the parts of the database that must not be accessible, and we must take
steps to ensure that these access rules are enforced.
We might begin with the six step process outlined here, a complete database design will probably
require a subsequent tuning phase in which all six kinds of design steps are interleaved and repeated
until the design is satisfactory.

2.2 ENTITIES, ATTRIBUTES, AND ENTITY SETS


An entity is an object in the real world that is distinguishable from other objects. Examples
include the following: the toy, the toy department, the manager of the toy department, the home
address of the manager of the toy department. A collection of similar entities is called an entity set.
Properties / characteristics which describe entities are called attributes. An entity is described using a
set of attributes. All entities in a given entity set have the same attributes; our choice of attributes
reflects the level of detail at which we wish to represent information about entities. For example, the
Employees entity set could use name, social security number (ssn), and parking lot (lot) as attributes.
For each attribute associated with an entity set, we must identify a domain of possible values. For
example, the domain associated with the attribute name of Employees might be the set of 20-character
strings.
Each entity set contains at least one set of attributes that uniquely identifies an entity in the
entity set; for each entity set, we choose a key. A candidate key is a minimal set of attributes whose
values uniquely identify an entity in the set. There could be more than one candidate key; if so, we
designate one of them as the primary key.
An entity set is represented by a rectangle, and an attribute is represented by an oval. Each
attribute in the primary key is underlined. The Employees entity set with attributes ssn, name, and lot is
shown in Figure 2.1

Figure 2.1

16
Dr.V.Sambasiva Rao, St.Mary’s Engineering College, Hyderabad
2.3 RELATIONSHIPS AND RELATIONSHIP SETS
A relationship is an association among two or more entities. For example, we may have the
relationship that Nikhil works in the pharmacy department. Collection of a set of similar relationships
is called a relationship set. A relationship set can be thought of as a set of n-tuples:

Each n-tuple denotes a relationship involving n entities el through en , where entity ei is in entity set Ei.
The below figure shows the relationship set Works_In, in which each relationship indicates a
department in which an employee works.

A relationship can also have descriptive attributes. Descriptive attributes are used to record
information about the relationship, rather than about any one of the participating entities;
An instance of a relationship set is a set of relationships. An instance can be thought of as a
'snapshot' of the relationship set at some instant in time. An instance of the Works_In relationship set is
shown in the following Figure.

Each Employees entity is denoted by its ssn, and each Departments entity is denoted by its did,
for simplicity. The since value is shown beside each relationship.
As another example of an ER diagram, suppose that each department has offices in several
locations and we want to record the locations at which each employee works. This relationship is
ternary because we must record an association between an employee, a department, and a location.

17
Dr.V.Sambasiva Rao, St.Mary’s Engineering College, Hyderabad
The ER diagram for this Works_In2, is shown in the below Figure

The entity sets that participate in a relationship set need not be distinct; sometimes a
relationship might involve two entities in the same entity set. For example, consider the Reports_To
relationship set shown in the following Figure. Since employees report_ to other employees, every
relationship in Reports_To is of the form (emp1, emp2), where both empl and emp2 are entities in
Employees.

However, they play different roles: ernp1 reports to the managing employee emp2, which is
reflected in the role indicators supervisor and subordinate in the above Figure. The Reports_To
relationship set has attributes corresponding to the ssn of the supervisor and the ssn of the subordinate,
and the names of these attributes are supervisor-ssn and subordinate-ssn.

18
Dr.V.Sambasiva Rao, St.Mary’s Engineering College, Hyderabad
2.4 ADDITIONAL FEATURES OF THE ER MODEL
The expressiveness of the ER model is a big reason for its widespread use. They provide constructs
that allow us to describe some subtle properties of the data.

2.4.1 Key Constraints


Consider another relationship set called Manages between the Employees and Departments
entity sets such that each department has at most one manager, although a single employee is allowed
to manage more than one department. The restriction that each department has at most one manager is
an example of a key constraint, and it implies that each Departments entity appears in at most one
Manages relationship in any allowable instance of Manages. This restriction is indicated in the
following ER diagram by using an arrow from Departments to Manages.

Figure 2.6

A relationship set like Manages is sometimes said to be one-to-many, to indicate that one
employee can be associated with many departments (in the capacity of a manager), whereas each
department can be associated with at most one employee as its manager. In contrast, the Works_In
relationship set, in which an employee is allowed to work in several departments and a department is
allowed to have several employees, is said to be many-to-many. If we add the restriction that each
employee can manage at most one department to the Manages relationship set, which would be
indicated by adding an arrow from Employees to Manages in the above Figure, we have a one-to-one
relationship set.

Key Constraints for Ternary Relationships


We can extend this convention-and the underlying key constraint concept-to relationship sets
involving three or more entity sets. In the below Figure, we show a ternary relationship with key
constraints.

19
Dr.V.Sambasiva Rao, St.Mary’s Engineering College, Hyderabad
Each employee works in at most one department and at a single location. An instance of the
Works_In3 relationship set is shown in the below Figure. Note that each department can be associated
with several employees and locations and each location can be associated with several departments and
employees; however, each employee is associated with a single department and location.

2.4.2 Participation Constraints


The key constraint on Manages tells us that a department has at most one manager. A natural
question to ask is whether every department has a manager. Let us say that every department is
required to have a manager. This requirement is an example of a participation constraint; the
participation of the entity set Departments in the relationship set Manages is said to be total. A
participation that is not total is said to be partial. As an example, the participation of the entity set
Employees in Manages is partial, since not every employee gets to manage a department.

20
Dr.V.Sambasiva Rao, St.Mary’s Engineering College, Hyderabad
Revisiting the Works_In relationship set, it is natural to expect that each employee works in at
least one department and that each department has at least one employee. This means that the
participation of both Employees and Departments in Works_In is total. The following ER diagram
shows both the Manages and Works _In relationship sets and all the given constraints. If the
participation of an entity set in a relationship set is total, the two are connected by a thick line;
independently, the presence of an arrow indicates a key constraint.

2.4.3 Weak Entities


Thus far, we have assumed that the attributes associated with an entity set include a key. This
assumption does not always hold. The entity set which cannot have a key is called a weak entity.
Suppose that employees can purchase insurance policies to cover their dependents. We wish to
record information about policies, including who is covered by each policy. If an employee quits, any
policy owned by the employee is terminated and we want to delete all the relevant policy and
dependent information from the database.
We might choose to identify a dependent by name alone in this situation, since it is reasonable
to expect that the dependents of a given employee have different names. Thus the attributes of the
Dependents entity set might be pname and age. The attribute pname does not identify a dependent
uniquely. For example, we might have two employees called Surendra and each might have a son
called Ramesh.
Dependents is an example of a weak entity set. A weak entity can be identified uniquely only
by considering some of its attributes in conjunction with the primary key of another entity, which is
called the identifying owner.

The following restrictions must hold:


21
Dr.V.Sambasiva Rao, St.Mary’s Engineering College, Hyderabad
 The owner entity set and the weak entity set must participate in a one-to- many relationship set
(one owner entity is associated with one or more weak entities, but each weak entity has a single
owner). This relationship set is called the identifying relationship set of the weak entity set.
 The weak entity set must have total participation in the identifying relationship set.
For example, a Dependents entity can be identified uniquely only if we take the key of the owning
Employees entity and the pname of the Dependents entity. The set of attributes of a weak entity set that
uniquely identify a weak entity for a given owner entity is called a partial key of the weak entity set. In
our example, pname is a partial key for Dependents.

The Dependents weak entity set and its relationship to Employees is shown in the above Figure.
The total participation of Dependents in Policy is indicated by linking them with a dark line. The arrow
from Dependents to Policy indicates that each Dependents entity appears in at most one (indeed,
exactly one, because of the participation constraint) Policy relationship. To underscore the fact that
Dependents is a weak entity and Policy is its identifying relationship, we draw both with dark lines. To
indicate that pname is a partial key for Dependents, we underline it using a broken line. This means
that there may well be two dependents with the same pname value.
2.4.4 Class Hierarchies
Sometimes it is natural to classify the entities in an entity set into subclasses. For example, we
might want to talk about an Hourly_Emps entity set and a Contract_Emps entity set to distinguish the
basis on which they are paid. We might have attributes hours_worked and hourly_wage defined for
Hourly_Emps and an attribute contractid defined for Contract_Emps.
The semantics that every entity in one of these sets is also an Employees entity and, as such,
must have all the attributes of Employees defined. Therefore, the attributes defined for an
Hourly_Emps entity are the attributes for Employees plus Hourly_Emps. We say that the attributes for
the entity set Employees are inherited by the entity set Hourly_Emps and that Hourly-Emps ISA (read
is a) Employees. A query that asks for all Employees entities must consider all Hourly_Emps and
Contract_Emps entities as well. The following Figure illustrates the class hierarchy.

22
Dr.V.Sambasiva Rao, St.Mary’s Engineering College, Hyderabad
A class hierarchy can be viewed in one of two ways:
• Employees is specialized into subclasses. Specialization is the process of identifying subsets of an
entity set (the superclass) that share some distinguishing characteristic. Typically, the superclass is
defined first, the subclasses are defined next, and subclass-specific attributes and relationship sets are
then added.

• Hourly_Emps and Contract_Emps are generalized by Employees. As another example, two entity
sets Motorboats and Cars may be generalized into an entity set Motor_Vehicles. Generalization
consists of identifying some common characteristics of a collection of entity sets and creating a new
entity set that contains entities possessing these common characteristics. Typically, the subclasses are
defined first, the superclass is defined next, and any relationship sets that involve the superclass are
then defined.

2.4.5 Aggregation
A relationship set is an association between entity sets. Sometimes, we have to model a
relationship between a collection of entities and relationships. Suppose that we have an entity set
called Projects and that each Projects entity is sponsored by one or more departments. The Sponsors
relationship set captures this information. A department that sponsors a project might assign employees
to monitor the sponsorship. Intuitively, Monitors should be a relationship set that associates a Sponsors
relationship (rather than a Projects or Departments entity) with an Employees entity.

23
Dr.V.Sambasiva Rao, St.Mary’s Engineering College, Hyderabad
Figure 2.13

To define a relationship set such as Monitors, we introduce a new feature of the ER model,
called aggregation. Aggregation allows us to indicate that a relationship set (identified through a
dashed box) participates in another relationship set. This is illustrated in the above Figure, with a
dashed box around Sponsors (and its participating entity sets) used to denote aggregation. This
effectively allows us to treat Sponsors as an entity set for purposes of defining the Monitors
relationship set. We use it when we need to express a relationship among relationships.

2.5 CONCEPTUAL DESIGN WITH THE ER MODEL


Developing an ER diagram presents several choices, including the following:
 Should a concept be modeled as an entity or an attribute?
 Should a concept be modeled as an entity or a relationship?
 What are the relationship sets and their participating entity sets?
 Should We use binary or ternary relationships?
 Should we use aggregation?

24
Dr.V.Sambasiva Rao, St.Mary’s Engineering College, Hyderabad
2.5.1 Entity versus Attribute
While identifying the attributes of an entity set, it is sometimes not clear whether a property
should be modeled as an attribute or as an entity set (and related to the first entity set using a
relationship set). For example, consider adding address information to the Employees entity set. One
option is to use an attribute address. This option is appropriate if we need to record only one address
per employee, and it suffices to think of an address as a string. An alternative is to create an entity set
called Addresses and to record associations between employees and addresses using a relationship.
This more complex alternative is necessary in two situations:
• We have to record more than one address for an employee.
• We want to capture the structure of an address in our ER diagram.
For another example of when to model a concept as an entity set rather than as attribute,
consider the relationship set (called Works_In4) shown in the following Figure.

It differs from the Works_In relationship set of Figure 2.2 only in that it has attributes from and
to, instead of since. Intuitively, it records the interval during which an employee works for a
department. Now suppose that it is possible for an employee to work in a given department over more
than one period. We can address this problem by introducing an entity set called, say, Duration, with
attributes from and to, as shown in the following Figure.

25
Dr.V.Sambasiva Rao, St.Mary’s Engineering College, Hyderabad
2.5.2 Entity versus Relationship
Consider the relationship set called Manages in Figure 2.6. Suppose that each department
manager is given a discretionary budget (dbudget). Given a department, we know the manager, as well
as the manager's starting date and budget for that department. This approach is natural if we assume
that a manager receives a separate discretionary budget for each department that he or she manages,
which is shown in the following Figure, in which we have renamed the relationship set to Manages2.

But what if the discretionary budget is a sum that covers all departments managed by that
employee? In this case, each Manages2 relationship that involves a given employee will have the same
value in the dbudget field, leading to redundant storage of the same information. Another problem with
this design is that it is misleading; it suggests that the budget is associated with the relationship, when
it is actually associated with the manager.
We can address these problems by introducing a new entity set called Managers (which can be
placed below Employees in an ISA hierarchy, to show that every manager is also an employee). The
attributes since and dbudget now describe a manager entity.

2.5.3 Binary versus Ternary Relationships


Consider the following ER diagram. It models a situation in which an employee can own
several policies, each policy can be owned by several employees, and each dependent can be covered
by several policies.

26
Dr.V.Sambasiva Rao, St.Mary’s Engineering College, Hyderabad
Suppose that we have the following additional requirements:
 A policy cannot be owned jointly by two or more employees.
 Every policy must be owned by some employee.
 Dependents is a weak entity set, and each dependent entity is uniquely identified by taking
pname in conjunction with the policyid of a policy entity (which, intuitively, covers the given
dependent).
The first requirement suggests that we impose a key constraint on Policies with respect to Covers,
but this constraint has the unintended side effect that a policy can cover only one dependent. The
second requirement suggests that we impose a total participation constraint on Policies. This solution is
acceptable if each policy covers at least one dependent. The third requirement forces us to introduce an
identifying relationship that is binary. The best way to model this situation is to use two binary
relationships, as shown in the following Figure.

2.5.4 Aggregation versus Ternary Relationships


The choice between using aggregation or a ternary relationship is mainly determined by the
existence of a relationship that relates a relationship set to an entity set (or second relationship set).
The choice may also be guided by certain integrity constraints that we want to express. For example,
consider the ER diagram shown in Figure 2.13. According to this diagram, a project can be sponsored
by any number of departments, a department can sponsor one or more projects, and each sponsorship is
monitored by one or more employees. If we don't need to record the until attribute of Monitors, then
we might reasonably use a ternary relationship, say, Sponsors2, as shown in the following Figure.

27
Dr.V.Sambasiva Rao, St.Mary’s Engineering College, Hyderabad
Consider the constraint that each sponsorship (of a project by a department) be monitored by at
most one employee. We cannot express this constraint in terms of the Sponsors2 relationship set. On
the other hand, we can easily express the constraint by drawing an arrow from the aggregated
relationship Sponsors to the relationship Monitors in Figure 2.13. Thus, the presence of such a
constraint serves as another reason for using aggregation rather than a ternary relationship set.

2.6 CONCEPTUAL DESIGN FOR LARGE ENTERPRISES


An important aspect of the design process is the methodology used to structure the
development of the overall design and ensure that the design takes into account all user requirements
and is consistent. The usual approach is that the requirements of various user groups are considered,
any conflicting requirements are somehow resolved, and a single set of global requirements is
generated at the end of the requirements analysis phase. Generating a single set of global requirements
is a difficult task, but it allows the conceptual design phase to proceed with the development of a
logical schema that spans all the data and applications throughout the enterprise.
An alternative approach is to develop separate conceptual schemas for different user groups
and then integrate these conceptual schemas. To integrate multiple conceptual schemas, we must
establish correspondences between entities, relationships, and attributes, and we must resolve
numerous kinds of conflicts (e.g., naming conflicts, domain mismatches, and differences in
measurement units). This task is difficult in its own right. In some situations, schema integration
cannot be avoided; for example, when one organization merges with another, existing databases may
have to be integrated. Schema integration is also increasing in importance as user’s demand access to
heterogeneous data sources, often maintained by different organizations.

28
Dr.V.Sambasiva Rao, St.Mary’s Engineering College, Hyderabad

You might also like