Cs8492 - Dbms Book
Cs8492 - Dbms Book
Anna University
Choice Based Credit System (CBCS)
Semester - IV (CSE / IT)
® TM
ISBN 978-93-332-2129-0
TECHNICAL
PUBLICATIONS
An Up-Thrust for Knowledge
9 789333 221290
(i)
Database Management Systems
Semester - IV (CSE / IT)
All publishing rights (printed and ebook version) reserved with Technical Publications. No part of this book
should be reproduced in any form, Electronic, Mechanical, Photocopy or any information storage and
retrieval system without prior permission in writing, from Technical Publications, Pune.
Published
®
by : TM
Amit Residency, Office No.1, 412, Shaniwar Peth, Pune - 411030, M.S. INDIA
TECHNICAL Ph.: +91-020-24495496/97, Telefax : +91-020-24495497
PUBLICATIONS Email : sales@technicalpublications.org Website : www.technicalpublications.org
ISBN 978-93-332-2129-0
9 7 8 9 3 3 3 2 2 1 2 9 0
AU 17
Syllabus
Purpose of Database System - Views of data - Data Models- Database System Architecture -
Introduction to relational databases - Relational Model - Keys - Relational Algebra - SQL
fundamentals - Advanced SQL features - Embedded SQL - Dynamic SQL.
Contents
1.1 Introduction to Database Management
1.2 Purpose of Database System ...................................... May-07, 12, Dec.-04, ........... Marks 8
1.3 Views of Data ................................................................ May-16 ............................... Marks 16
1.4 Data Models ................................................................ Dec.-14, ................................ Marks 8
1.5 Database System Architecture .................................... May-12, 13, 14, 16, 17,
................................................................................. Dec.-08, 15, 17,................... Marks 16
1.6 Data Independence
1.7 Introduction to Relational Databases
1.8 Relational Model
1.9 Keys ........................................................................... May-06, 07, 12, Dec.-06, ..... Marks 4
1.10 Integrity Constraints ................................................. Dec.-05, ............................. Marks 10
1.11 Database integrity
1.12 Relational Algebra..................................................... May-03,04,05,14,15,16,17,18,
................................................................................. Dec.-02,07,08,11,15,16,17 . Marks 16
1.13 SQL Fundamentals ................................................... Dec.-15,............................... Marks 16
1.14 Advanced SQL Features
1.15 Dynamic SQL ............................................................. May-17, Dec.-17, ................ Marks 11
1.16 Two Marks Questions with Answers
(1 - 1)
Database Management Systems 1-2 Relational Databases
Earlier database systems are created in response to manage the commercial data.
These data is typically stored in files. To allow users to manipulate these files
various programs are written for
Database Management Systems 1-3 Relational Databases
8) Security problems : Every user is not allowed to access all the data of database
system. Since application program in file system are added in an ad hoc manner,
enforcing such security constraints become difficult.
Database systems offer solutions to all the above mentioned problems.
Difference between Database System and Conventional File System
Sr. No. Database systems Conventional file systems
3. Database systems are used when Conventional file systems are used where
security constraints are high. there is less demand for security constraints.
4. Database systems define the data in a File systems define the data in un-structured
structured manner. Also there is well manner. Data is usually in isolated form.
defined co-relation among the data.
5. Data inconsistency is less in database Data inconsistency is more in file systems.
systems.
6. User is unknown to the physical User locates the physical address of file to
address of the data used in database access the data in conventional file systems.
systems.
7. We can retrieve the data in any desired We cannot retrieve the data in any desired
format using database systems. format using file systems.
8. There is ability to access the data There is no ability to concurrently access the
concurrently using database systems. data using conventional file system.
3) Data can be isolated in separate tables for convenient and efficient use.
4) Data can be accessed efficiently using a simple query language.
5) The data integrity can be maintained. That means – the constraints can be applied
on data and it should be in some specific range.
6) The atomicity of data can be maintained. That means, if some operation is
performed on one particular table of the database, then the change must be reflected
for the entire database.
7) The DBMS allows concurrent access to multiple users by using the synchronization
technique.
8) The security policies can be applied to DBMS to allow the user to access only
desired part of the database system.
Disadvantages of Database Systems
1) Complex design : Database design is complex, difficult and time consuming.
2) Hardware and software cost : Large amount of investment is needed to setup the
required hardware or to repair software failure.
3) Damaged part : If one part of database is corrupted or damaged, then entire
database may get affected.
4) Conversion cost : If the current system is in conventional file system and if we need
to convert it to database systems then large amount of cost is incurred in purchasing
different tools, and adopting different techniques as per the requirement.
5) Training : For designing and maintaining the database systems, the people need to
be trained
University Questions
1. Compare file system with database system AU : May-07, Marks 8, May-12, Marks 2
Database is a collection of interrelated data and set of programs that allow users to
access or modify the data.
Abstract view of the system is a view in which the system hides certain details of
how the data are stored and maintained.
The main purpose of database systems is to provide users with abstract view of
the data.
The view of the system helps the user to retrieve data efficiently.
For simplifying the user interaction with the system there are several levels of
abstraction - these levels are - Physical level, logical level and view level.
Database Management Systems 1-6 Relational Databases
o This is the next higher level, which describes the what data are stored in database.
o This level also describes the relationship among the data.
o The logical level thus describes then entire database in terms of small number of
relatively simple structures.
The database administrators use logical level of abstraction for deciding what
information to keep in database.
3) View level :
o This is highest level of abstraction that describes only part of the entire database.
o The view level can provide the access to only part of the database.
o This level helps in simplifying the interaction with the system.
o The system can provide multiple views of the same system.
o Clerk at the reservation system, can see only part of the database and can access the
required information of the passenger.
Fig. 1.3.1 shows the relationship among the three levels of abstraction.
Instances : When information is inserted or deleted from the database then the
database gets changed. The collection of information at particular moment is called
instances. For example - following is an instance of student database
RollNo Name Marks
10 AAA 43
20 BBB 67
Database Management Systems 1-8 Relational Databases
Types of Schema : The database has several schema based on the levels of abstraction.
(1) Physical Schema : The physical schema is a database design described at the
physical level of abstraction.
(2) Logical Schema : The logical schema is a database design at the logical level of
abstraction.
(3) Subschema : A database may have several views at the view level which are
called subschemas.
University Question
1. Briefly explain about views of data. AU : May-16, Marks 16
Advantages :
(i) Structural Independence : Structural independence is an ability that allows us to
make changes in one database structure without affecting other. The relational
model have structural independence. Hence making required changes in the
database is convenient in relational database model.
(ii)Conceptual Simplicity : The relational model allows the designer to simply focus
on logical design and not on physical design. Hence relational models are
conceptually simple to understand.
Database Management Systems 1 - 10 Relational Databases
(iii) Query Capability : Using simple query language (such as SQL) user can get
information from the database or designer can manipulate the database structure.
(iv) Easy design,maintenance and usage : The relational models can be designed
logically hence they are easy to maintain and use.
Disadvantages :
(i) Relational model requires powerful hardware and large data storage devices.
(ii) May lead to slower processing time.
(iii) Poorly designed systems lead to poor implementation of database systems.
Advantages :
i) Simple : It is simple to draw ER diagram when we know entities and relationships.
ii) Easy to understand : The design of ER diagram is very logical and hence they are
easy to design and understand.
iii) Effective: It is effective communication tool.
iv) Integrated : The ER model can be easily integrated with Relational model.
v) Easy conversion: ER model can be converted easily into other type of models.
Disadvantages :
i) Loss of information : While drawing ER model some information can be hidden
or lost.
ii) Limited relationships : The ER model can represent limited relationships as
compared to other models.
Database Management Systems 1 - 11 Relational Databases
Advantages :
i) Enriched modelling : The object based data model has capability of modelling the
real world objects.
ii) Reusability : There are certain features of object oriented design such as inheritance,
polymorphism which help in reusability.
iii) Support for schema evolution : There is a tight coupling between data and
applications, hence there is strong support for schema evolution.
iv) Improved performance : Using object based data model there can be significant
improvement in performance using object based data model.
Disadvantages :
i) Lack of universal data model : There is no universally agreed data model for an
object based data model, and most models lack a theoretical foundation.
ii) Lack of experience : In comparison with relational database management the use of
object based data model is limited. This model is more dependent on the skilled
programmer.
iii)Complex : More functionalities present in object based data model make the design
complex.
Advantages
i) Data is not constrained by fixed schema.
ii) It is flexible.
iii) It is portable.
Disadvantage
i) Queries are less efficient than other types of data model.
University Question
1. Write short note on : Data model and its types. AU : Dec.-14, Marks 8
1.5 Database System Architecture AU : May-12, 13, 14, 16, 17, Dec.-08, 15, 17
• The typical structure of typical DBMS is based on relational data model as shown in
Fig. 1.5.1. (Refer page 1-14).
• Consider the top part of Fig. 1.5.1. It shows application interfaces used by naïve
users, application programs created by application programmers, query tools used
by sophisticated users and administration tools used by database administrator
• The two important components of database architecture are - Query processor and
storage manager.
Query processor :
The interactive query processor helps the database system to simplify and
facilitate access to data. It consists of DDL interpreter, DML compiler and query
evaluation engine.
ii) DML compiler : It translates DML statements query language into an evaluation
plan. This plan consists of the instructions which query evaluation engine
understands.
Database Management Systems 1 - 13 Relational Databases
iii) Query evaluation engine : It executes the low-level instructions generated by the
DML compiler.
When a user issues a query, the parsed query is presented to a query optimizer,
which uses information about how the data is stored to produce an efficient
execution plan for evaluating the query. An execution plan is a blueprint for
evaluating a query. It is evaluated by query evaluation engine.
Storage manager :
ii) Transaction manager : Ensures that the database remains in consistent despite
of system failures and concurrent transaction execution proceeds without
conflicting.
iv) Buffer manager : Manages the fetching of data from disk storage into main
memory. The buffer manager also decides what data to cache in main memory.
Buffer manager is a crucial part of database system.
ii) Data dictionary : Used for storing metadata, particularly schema of database.
iii) Indices : Indices are used to provide fast access to data items present in the
database
Database Management Systems 1 - 14 Relational Databases
University Questions
1. Explain the overall architecture of database system in detail.
AU : May-14,17, Dec.-17, Marks 8, May-16, Marks 16
2. With the help of a neat block diagram explain basic architecture of a database management system.
AU : May-12, May 13, Marks 16,Dec.-15, Marks 8
Q. Explain component modules of a DBMS and their interactions with the architecture
AU : Dec 08, Marks 10
By these data independence the time and cost acquired by changes in any one level can
be reduced and abstract view of data can be provided to the user.
103 Electrical 5
104 Civil 3
From this third table we can easily find out that the course to which the RollNo 001 is
admitted is computer Science.
There are some commonly used terms in Relational Model and those are -
Tuple or record or row : The single entry in the table is called tuple. The tuple
represents a set of related data. In above Student table there are four tuples. One of the
tuple can be represented as
001 AAA 88 1111111111
Attribute or columns : It is a part of table that contains several records. Each record
can be broken down into several small parts of data known as attributes. For example the
above table consists of four attributes such as RollNo,Name,Marks and Phone.
Relation schema : A relation schema describes the structure of the relation, with the
name of the relation (i.e. name of table), its attributes and their names and type.
Relation Instance : It refers to specific instance of relation i.e. containing a specific set
of rows. For example – the following is a relation instance – which contains the records
with marks above 80.
Database Management Systems 1 - 18 Relational Databases
Degree : It is nothing but total number of columns present in the relational database. In
given Student table –
The degree is 4.
iv) Degree
v) Cardinality
Solution :
i) No of Columns = 6
ii) No of Tuples= 3
iii) Different attributes are StaffID, Name,Sex, Designation, Salary, DOJ
iv) Degree= Total number of columns=6
v) Cardinality =Total number of rows = 3
Keys are used to specify the tuples distinctly in the given relation.
Various types of keys used in relational model are – Superkey, Candidate Keys,
primary keys, foreign keys. Let us discuss them with suitable example
1) Super Key(SK): It is a set of one or more attributes within a table that can uniquely
identify each record within a table. For example – Consider the Student table as
follows –
Reg No. Roll No Phone Name Marks
Clearly using the (RegNo) and (RollNo,Phone,Name) we can identify the records
uniquely but (Name, Marks) of two students can be same, hence this combination
not necessarily help in identifying the record uniquely.
Thus every candidate key is a superkey but every superkey is not a candidate key.
3) Primary Key(PK): The primary key is a candidate key chosen by the database
designer to identify the tuple in the relation uniquely. For example – Consider the
following representation of primary key in the student table
Database Management Systems 1 - 21 Relational Databases
Other than the above mentioned primary key, various possible primary keys can be
(RollNo), (RollNo,Name), (RollNo, Phone)
The relation among super key, candidate key and primary can be denoted by
Candidate Key=Super Key – Primary Key
Rules for Primary Key
(i) The primary key may have one or more attributes.
(ii) There is only one primary key in the relation.
(iii) The value of primary key attribute can not be NULL.
(iv) The value of primary key attribute does not get changed.
4) Alternate key : The alternate key is a candidate key which is not chosen by the
database designer to uniquely identify the tuples. For example –
Database Management Systems 1 - 22 Relational Databases
5) Foreign key : Foreign key is a single attribute or collection of attributes in one table
that refers to the primary key of other table.
Thus foreign keys refer to primary key.
The table containing the primary key is called parent table and the table
containing foreign key is called child table.
Example -
From above example, we can see that two tables are linked. For instance we could
easily find out that the ‘Student CCC has opted for ComputerSci course’
University Question
1. Explain distinction among the terms primary key, candidate key, foreign key and super key with
suitable example AU : May-06, 07, 12, Dec.-06, Marks 4
The primary key value helps in uniquely identifying every row in the table. Thus if the
users of the database want to retrieve any row from the table or perform any action on
that table, they must know the value of the key for that row. Hence it is necessary that the
primary key should not have the NULL value.
University Question
1. Discuss the entity Integrity and referential integrity constraints. Why are they important ? Explain
them with suitable examples. AU : Dec.-05, Marks 10
Customer
Order
Note that the "CustID" column in the "Order" table points to the "CustID" column
in the "Customer" table.
The "CustID" column in the "Customer" table is the PRIMARY KEY in the
"Customer" table.
The "CustID" column in the "Order" table is a FOREIGN KEY in the "Order" table.
The table containing the foreign key is called the child table, and the table
containing the primary key is called the referenced or parent table.
The FOREIGN KEY constraint is used to prevent actions that would destroy links
between tables.
Database Management Systems 1 - 25 Relational Databases
The FOREIGN KEY constraint also prevents invalid data from being inserted into
the foreign key column, because it has to be one of the values contained in the
table it points to.
The queries present in the relational algebra are denoted using operators.
Each relational algebra is procedural. That means Each relational query describes
a step-by-step procedure for computing the desired answer, based on the order in
which operators are applied in the query.
(1) Selection :
This operation is used to fetch the rows or tuples from the table(relation).
Syntax : The syntax is
predicate(relation)
where σ represents the select operation. The predicate denotes some logic using
which the data from the relation(table) is selected.
For example - Consider the relation student as follows
2 Shyam 18 Male
3 Seeta 16 Female
4 Geeta 23 Female
sname age
Ram 21
Shyam 18
Seeta 16
Geeta 23
Fig. 1.12.2
Student Reserve
sid sname age sid isbn day
1 Ram 21 1 005 07-07-18
4 Geeta 23
Query : Find the names of all the students who have reserved isbn = 005. To
satisfy this query we need to extract data from two table. Hence the cartesian
product operator is used as
(Student.sid = Reserve.sid
^
Reserve.Isbn =005(Student × Reserve)
o For this operation to work, the relations(tables) specified should have same
number of attributes(columns) and same attribute domain. Also the duplicate
tuples are automatically eliminated from the result.
o Syntax : A ∪ B
o where A and B are relations.
o For example : If there are two tables student and book as follows –
Database Management Systems 1 - 29 Relational Databases
Student Book
sid sname age isbn bname Author
1 Ram 21 005 DBMS XYZ
4 Geeta 23
o Query : We want to display both the student name and book names from both the
tables then
Sname(Student) ∪ bname (Book)
(ii) Intersection :
o This operation is used to fetch data from both tables which is common in both
the tables.
o Syntax : A ∩ B
where A and B are relations.
o Query : If we want to find out the names of the students who are working in a
company then
name(Student) ∩ name (Worker)
Name
AAA
DDD
(iii) Set-Difference : The result of set difference operation is tuples, which are
present in one relation but are not in the second relation.
Database Management Systems 1 - 30 Relational Databases
Syntax : A – B
For Example : Consider two relations Full_Time_Employee and Part_Time_Employee, if
we want to find out all the employee working for Fulltime, then the set difference
operator is used -
EmpName(Full_Time_Employee) - EmpName (Part_Time_Employee)
(5) Join : The join operation is used to combine information from two or more relations.
Formally join can be defined as a cross-product followed by selections and projections,
joins arise much more frequently in practice than plain cross-products. The join operator
is used as ⋈
There are three types of joins used in relational algebra
i) Conditional join : This is an operation in which information from two tables is
combined using some condition and this condition is specified along with the join
operator.
A ⋈cB = c(A × B)
Thus ⋈ is defined to be a cross-product followed by a selection. Note that the
condition c can refer to attributes of both A and B. The condition C can be specified
using <,<=,>,<= or = operators.
For example consider two table student and reserve as follows -
Student Reserve
sid sname age sid isbn day
1 Ram 21 1 005 07-07-18
4 Geeta 23
If we want the names of students with sid(Student) = sid(Reserve) and isbn = 005,
then we can write it using Cartesian product as -
ii) Equijoin : This is a kind of join in which there is equality condition between two
attributes(columns) of relations(tables). For example - If there are two table Book
and Reserve table and we want to find the book which is reserved by the student
having isbn 005 and name of the book is ‘DBMS’, then :
Book Reserve
isbn bname Author sid isbn day
005 DBMS XYZ 1 005 07-07-18
iii)Natural Join : When there are common columns and we have to equate these
common columns then we use natural join. The symbol for natural join is simply ⋈
without any condition. For example, consider two tables -
Book Reserve
isbn bname Author sid isbn day
Now if we want to list the books that are reserved, then that means we want to
match Books.isbn with Reserve.isbn. Hence it will be simply
Database Management Systems 1 - 32 Relational Databases
Books ⋈ Reserve
(6) Rename operation : This operation is used to rename the output relation for any
query operation which returns result like Select, Project etc. Or to simply rename a
relation(table). The operator (rho) is used for renaming.
Syntax : (RelationNew, RelationOld)
For example : If you want to create a relation Student_names with sid and sname from
Student, it can be done using rename operator as :
ρ(Student_names, (sid.sname(Student))
(7) Divide operation
The division operator is used when we have to evaluate queries which contain the
keyword ALL.
It is denoted by A/B where A and B are instances of relation.
For example - Find all the customers having accounts in all the branches. For that
consider two tables - Customer and Account as
Customer Account
Name Branch Branch
A Pune Pune
B Mumbai Mumbai
A Mumbai
C Pune
(iii) Find the id of sailors with age over 20 who have not reserved red boat
(iv) Find the names of sailors who have reserved at least one boat
Solution :
(i) (sname((bid=103 Reserves) ⋈ Sailors)
Solution :
a. Select each student who takes at least one course in 2009, display the student
information along with the information about what the courses the student took.
b. Select each student who takes at least one course in 2009, display the student
information along with the information about what the courses the student took but
the selection must be before join operation.
c. Display the ID, Name and Course_id of all the students who took any course in the
university.
Example 1.12.3 Consider following relational database
branch(branch_name, branch_city, assets)
customer (customer_name, customer_street, customer_city)
loan (loan_number, branch_name, amount)
borrower (customer_name, loan_number)
account (account_number, branch_name, balance)
depositor (customer_name, account_number)
i) Find the names of all branches located in “Chennai”.
ii) Find the names of all borrowers who have a loan in branch “ABC”.
Solution :
i) branch_name(branch_city =’Chennai’) (branch))
Solution :
a) person-name(company-name = “First Bank Corporation”(works))
b) person-name, street, city(company-name = “First Bank Corporation” salary > 200000 (works ⋈ employee))
University Questions
1. Explain select, project, cartesian product and join operations in relational algebra with an example
AU : May-18, Marks 13, Dec.-16, Marks 6
2. List operations of relational algebra and purpose of each with example
AU : May-17, Marks 5
3. Differentiate between foreign key constraints and referential Integrity constraints with suitable
example.
AU : Dec.-17, Marks 6
4. Explain various operations in relational algebra with examples
AU : May 03, Marks 10, Dec-07, Marks 8, Dec.- 08, Marks 10, May-14, Marks 16
5. Explain all join operations in relational algebra
AU : May 05, Marks 8
6. Briefly explain relational algebra
AU : May 04, Marks 8
7. What is rename operation in relational algebra ? Illustrate your answer with example
AU : Dec 02, Marks 2
Database Management Systems 1 - 36 Relational Databases
(5) smallint : It is used to store small integer value. It allows machine dependent subset
of integer type.
(6) real : It allows the floating point, double precision numbers.
(7) float(n) : For representing the floating point number with precision of at least n
digits this data type is used.
Syntax
create table table_name;
Example
create table Student
(RollNo int,
Name varchar(10),
Marks numeric(3,2),
Primary key(RollNo));
The primary key attribute must be non null and unique.
(2) Insert : The insert command is used to insert data into the table. There are two
syntaxes of inserting data into SQL
Syntax
i) Insert into table_name (column1, column2, column3, ...)
values (value1, value2, value3, ...);
ii) insert into table_name
values (value1, value2, value3, ...);
Example
(i) insert into Student(RollNo,Name,Makrs) values(101,’AAA’,56.45)
(ii) insert into Student values(101,’AAA’,56.45)
(3) Delete : This command is used to delete the existing record.
Syntax
delete from table_name
where condition;
Example
Delete from student
where RollNo=10
(4) Alter: The alter table statement is used to add, delete, or modify columns in an
existing table.
The alter table statement is also used to add and drop various constraints on an existing
table.
Database Management Systems 1 - 39 Relational Databases
3 Seeta 16 3 009
4 Geeta 23
Query : Find the names of students who have reserved the books with book isbn
Select Student.sname,Reserve.isbn
From Student, Reserve
Where Student.sid=Reserve.sid
Use of SQL Join
The SQL Joins clause is used to combine records from two or more tables in a database. A
JOIN is a means for combining fields from two tables by using values common to each.
Example : Consider two tables for using the joins in SQL. Note that cid is common
column in following tables.
Student Reserve
4 NULL Geeta
1) Inner Join :
The most important and frequently used of the joins is the INNER JOIN. They are
also known as an EQUIJOIN.
The INNER JOIN creates a new result table by combining column values of two
tables (Table1 and Table2) based upon the join-predicate.
The query compares each row of table1 with each row of Table2 to find all pairs of
rows which satisfy the join-predicate.
Database Management Systems 1 - 41 Relational Databases
When the join-predicate is satisfied, column values for each matched pair of rows
of A and B are combined into a result row. It can be represented as :
2) Left Join :
The SQL LEFT JOIN returns all rows from the left table, even if there are no
matches in the right table. This means that if the ON clause matches 0 (zero)
records in the right table; the join will still return a row in the result, but with
NULL in each column from the right table.
This means that a left join returns all the values from the left table, plus matched
values from the right table or NULL in case of no matching join predicate.
It can be represented as -
FROM Table1
LEFT JOIN Table2
ON Table1.common_field = Table2.common_field;
Example : For above given two tables namely Student and City, we can apply Left
join. It will Return all records from the left table, and the matched records from
the right table using the common column cid. The query will be
SELECT *
FROM Student Left Join City on Student.cid=City.cid
The result will be
3) Right Join :
The SQL RIGHT JOIN returns all rows from the right table, even if there are no
matches in the left table.
This means that if the ON clause matches 0 (zero) records in the left table; the join
will still return a row in the result, but with NULL in each column from the left
table.
This means that a right join returns all the values from the right table, plus
matched values from the left table or NULL in case of no matching join predicate.
It can be represented as follows :
4) Full Join :
The SQL FULL JOIN combines the results of both left and right outer joins.
The joined table will contain all records from both the tables and fill in NULLs for
missing matches on either side.
It can be represented as
Select S.sname,R.isbn
From Student as S, Reserve as R
Where S.sid=R.sid
In above case we could shorten the names of tables Student and Reserve as S and R
respectively.
Another reason to rename a relation is a case where we wish to compare tuples in the
same relation. We then need to take the Cartesian product of a relation with itself. For
example –
If the query is – Find the names of students who reserve the book of isbn 005. Then the
SQL statement will be –
Select S.sname,R.isbn
From Student as S, Reserve as R
Where S.sid=R.sid and S.isbn=005
2) Attribute Specification in Select clause : The symbol * is used in select clause to
denote all attributes. For example – To select all the records from Student table we
can write
Select* from Student
3) Ordering the display of tuples : For displaying the records in particular order we
use order by clause.
The general syntax with ORDER BY is
SELECT column_name(s)
FROM table_name
WHERE condition
ORDER BY column_name(s)
Example : Consider the Student table as follows –
Database Management Systems 1 - 45 Relational Databases
2 BBB 70 Mumbai
3 CCC 90 Pune
4 DDD 55 Mumbai
Database Management Systems 1 - 46 Relational Databases
Syntax of AND
SELECT column1, column2, ...
FROM table_name
WHERE condition1 AND condition2 AND condition3 ...;
Example : Find the student having name “AAA” and lives in city “Pune”
SELECT *
FROM Students
Where sname=’AAA’ AND city=’Pune’
Output
Syntax OR
SELECT column1, column2, ...
FROM table_name
WHERE condition1 OR condition2 OR condition3 ...;
Example : Find the student having name “AAA” OR lives in city “Pune”
SELECT *
FROM Students
Where sname=’AAA’ OR city=’Pune’
Output
3 CCC 90 Pune
Syntax NOT
SELECT column1, column2, ..
FROM table_name
WHERE NOT condition
Example : Find the student who do not have city “Pune”
SELECT *
FROM Students
Where NOT city=’Pune’
Output
The second part of the definition means, for example, that the set of fields
{RollNo, Name} is not a key for Students, because this set properly contains the
key {RollNo}. The set {RollNo, Name} is an example of a superkey, which is a set
of fields that contains a key.
The key constraint can be specified using SQL as follows -
o In SQL, we can declare that a subset of the columns of a table constitute a key by
using the UNIQUE constraint.
o At most one of these candidate keys can be declared to be a primary key, using
the PRIMARY KEY constraint. For example -
CREATE TABLE Student(RollNo integer,
Name CHAR(20),
age integer,
UNIQUE(Name,age),
CONSTRAINT StudentKey PRIMARY KEY(RollNo))
This definition says that RollNo is a Primary key and Combination of Name and
age is also a key.
o ‘Data%’ matches any string beginning with “Data”, For instance it could
be with “Database”, “DataMining”,”DataStructure”
o ‘_ _ _’ matches any string of exactly three characters.
o ‘_ _ _ %’matches any string of at least length 3 characters.
Database Management Systems 1 - 49 Relational Databases
The LIKE clause can be used in WHERE clause to search for specific patterns.
For example – Consider following Employee Database
(1) Find all the employee with EmpName starting with “s”
SQL Statement:
SELECT * FROM Employee
WHERE EmpName LIKE ‘s%’
Output
EmpID EmpName Department Date_of_Join
1 Sunil Marketing 1-Jan
3 Supriya Manager 3-Jan
4 Sonia Accounts 4-Jan
5 Suraj Sales 5-Jan
(2) Find the names of employee whose name begin with S and end with a
SQL Statement :
SELECT EmpName FROM Employee
WHERE EmpName LIKE ‘S%a’
Output
EmpName
Supriya
Sonia
(3) Find the names of employee whose name begin with S and followed by exactly
four characters
SELECT EmpName FROM Employee
WHERE EmpName LIKE ‘S_ _ _ _ ‘
Database Management Systems 1 - 50 Relational Databases
Output
EmpName
Sunil
Sonia
Suraj
Syntax
The basic syntax of a UNION clause is as follows –
SELECT column1 [, column2 ]
FROM table1 [, table2 ]
[WHERE condition]
UNION
SELECT column1 [, column2 ]
FROM table1 [, table2 ]
[WHERE condition]
Here, the given condition could be any given expression based on your requirement.
Consider Following relations –
Student Reserve
sid sname age sid isbn day
1 Ram 21 1 005 07-07-18
4 Geeta 23
Database Management Systems 1 - 51 Relational Databases
Book
isbn bname Author
005 DBMS XYZ
006 OS PQR
Example : Find the names of the students who have reserved the ‘DBMS’ book or ‘OS’
Book
The query can then be written by considering the Student, Reserve and Book table as
SELECT S.sname
FROM Student S, Reserve R, Book B
WHERE S.sid=R.sid AND R.isbn=B.isbn AND B.bname=’DBMS’
UNION
SELECT S.sname
FROM Student S, Reserve R, Book B
WHERE S.sid=R.sid AND R.isbn=B.isbn AND B.bname=’OS’
2) Intersect : The common entries between the two tables can be represented with the
help of Intersect operator. It replaces the AND operator in the query.
Syntax
The basic syntax of a INTERSECT clause is as follows –
SELECT column1 [, column2 ]
FROM table1 [, table2 ]
[WHERE condition]
INTERSECT
SELECT column1 [, column2 ]
FROM table1 [, table2 ]
[WHERE condition]
Example : Find the students who have reserved both the ‘DBMS’ book and ‘OS’ Book
The query can then be written by considering the Student, Reserve and Book table as
SELECT S.sid, S.sname
FROM Student S, Reserve R, Book B
WHERE S.sid=R.sid AND R.isbn=B.isbn AND B.bname=’DBMS’
INTERSECT
SELECT S.sname
FROM Student S, Reserve R, Book B
WHERE S.sid=R.sid AND R.isbn=B.isbn AND B.bname=’OS’
3) Except : The EXCEPT clause is used to represent the set-difference in the query.
This query is used to represent the entries that are present in one table and not in other.
Database Management Systems 1 - 52 Relational Databases
Syntax :
The basic syntax of a EXCEPT clause is as follows –
SELECT column1 [, column2 ]
FROM table1 [, table2 ]
[WHERE condition]
EXCEPT
SELECT column1 [, column2 ]
FROM table1 [, table2 ]
[WHERE condition]
Example : Find the students who have reserved both the ‘DBMS’ book but not reserved
‘OS’ Book
The query can then be written by considering the Student, Reserve and Book table as
SELECT S.sid, S.sname
FROM Student S, Reserve R, Book B
WHERE S.sid=R.sid AND R.isbn=B.isbn AND B.bname=’DBMS’
EXCEPT
SELECT S.sname
FROM Student S, Reserve R, Book B
WHERE S.sid=R.sid AND R.isbn=B.isbn AND
B.bname=’OS’
The max function is used to get the maximum value from the specified column.
For example – Consider the above created Test table
SQL Statement
SELECT Max(value)
FROM Test
Output
400
The sum function is used to get total sum value from the specified column. For
example – Consider the above created Test table
SQL Statement
SELECT sum(value)
FROM Test
Output
1000
1 AAA 60 Pune
2 BBB 70 Mumbai
3 CCC 90 Pune
4 DDD 55 Mumbai
5 EEE 84 Chennai
Query : Find the total marks of each student in the city named ‘Pune’ and ‘Mumbai’ only
SELECT SUM(marks), city
FROM Student
GROUP BY city
HAVING city IN(‘Pune’,’Mumbai’)
Database Management Systems 1 - 56 Relational Databases
Output
The result will be as follows –
SUM(marks) city
150 Pune
125 Mumbai
4 Geeta 4444
Student_City
sid cid
1 101
1 103
2 101
3 102
4 102
4 103
Database Management Systems 1 - 57 Relational Databases
Example 1 - If we want to find out sid who live in city ‘Pune’ or ‘Chennai’.
We can then write independent nested query using IN operator. Here we can
use the IN operator allows you to specify multiple values in a WHERE
clause. The IN operator is a shorthand for multiple OR conditions.
Step 2 : Using cid obtained in step 1 we can find the sid. The query will be
SELECT sid
FROM Student_City
WHERE cid IN
(SELECT cid FROM City WHERE cname=’Pune’ or cname=’Chennai’)
The inner query will return a set with members 101 and 103 and outer query will return
those sid for which cid is equal to any member of set (101 and 103 in this case). So, it will
return 1, 2 and 4.
Example 2 : If we want to find out sname who live in city ‘Pune’ or ‘Chennai’.
SELECT sname FROM Student WHERE sid IN
(SELECT sid FROM Student_City WHERE cid IN
(SELECT cid FROM City WHERE cname=’Pune’ or cname=’Chennai’))
Syntax
delete from table_name
where condition;
Example
delete from student
where RollNo=10
2. Insertion : The insert command is used to insert data into the table. There are two
syntaxes of inserting data into SQL
Syntax
(i) Insert into table_name (column1, column2, column3, ...)
values (value1, value2, value3, ...);
(ii) insert into table_name
values (value1, value2, value3, ...);
Example
(i) insert into Student(RollNo,Name,Makrs) values(101,’AAA’,56.45)
(ii) insert into Student values(101,’AAA’,56.45)
3. Update : The update statement is used to modify the existing records in the table.
update table_name
set column1=value1, column2=value2,…
where condition;
Example:
Delete student
Set Name=’WWW’
where RollNo=101
Example 1.13.1 Write the DDL, DML, DCL for the students database. Which contains
student details:name, id,DOB, branch, DOJ.
Course details : Course name, Course id, Stud.id,Faculty name, id, marks
AU : Dec.-17, Marks 15
Solution :
DDL Commands
CREATE TABLE Student
(
stud_name varchar(20),
stud_id int(3),
DOB varchar(15),
branch varchar(10),
DOJ varchar(15),
);
CREATE TABLE Course
(
Database Management Systems 1 - 59 Relational Databases
course_name varchar(20),
course_id int(5),
stud_id int(3),
facult_name varchar(20),
faculty_id varchar(5),
marks real
);
DML Commands
The commands which we will use here are insert and select. The insert command is
used to insert the values into database tables. Using the select command, the database
values can be displayed.
(1) Inserting values into Student table
insert into Student(stud_name,stud_id,DOB,branch,DOJ)
values(’AAA’,11,’01-10-1999’ , ’computers’,’5-3-2018’)
The DCL command is used to control privileges in Database. To perform any operation
in the database, such as for creating tables, sequences or views, a user needs privileges.
We will use the command GRANT.
To allow a user to create tables in the database, we can use the below command,
Grant create table to user1;
Example 1.13.2 C Write the following queries in relational algebra and SQL
(i) Find the names of employee who have borrowed a book published by McGraw Hill
(ii) Find the names of employees who have borrowed all books published by McGraw-Hill
AU : May 17, Marks 10
Solution :
We will assume the databases as –
member(memb_no, name, dob)
books(isbn, title, authors, publisher)
borrowed(memb_no, isbn, date)
(i) Relational Algebra :
name((publisher=’McGraw Hill’ books) ⋈ borrowed ⋈ member)
SQL :
SELECT name
FROM member
WHERE meber.memb_no=borrowed.memb_no
AND books.isbn=borrowed.isbn
AND books.publisher=’McGraw Hill’;
(ii) Relational Algebra
(Tempname,( memb_no,isbn borrowed)/ isbn (publisher=’McGraw Hill’ books)))
name(Tempname ⋈ member)
SQL :
SELECT distinct M.name
FROM Member M,
WHERE NOT EXIST
(
(SELECT isbn
FROM books
WHERE publisher = ’McGrawHill’
)
EXCEPT
(SELECT isbn
FROM borrowed R
Database Management Systems 1 - 61 Relational Databases
Solution :
(i)
SELECT P.name,C.description
FROM Professor P, Course C
WHERE P.ProfessorNumber=C.ProfessorNumber
HAVING count(DISTINCT P.name)=2
(ii)
SELECT P.name,C.description
FROM Professor P, Course C
WHERE P.ProfessorNumber=C.ProfessorNumber
(iii)
SELECT P.name,P.office, C.description
FROM Professor P, Course C
WHERE P.ProfessorNumber=C.ProfessorNumber
Database Management Systems 1 - 63 Relational Databases
(iv)
SELECT S.StudentNumber,S.StudentNumber,C.Description
FROM Student S, Course C, Registration R
WHERE S.StudentNumber=R.StudentNumber AND C.CourseNumber=R.CourseNumber
(v)
SELECT S.StudentName, P.Name
FROM Student S, Course C, Professor P, Registration R
WHERE C.ProfessorNumber=P.ProfessorNumber
AND C.CourseNumber=R.CourseNumber
AND S.StudentNumber=R.StudentNumber
GROUP BY P.ProfessorNumber
(vi)
SELECT S.StudentName, C.Description
FROM Student S, Course C, Registration R
WHERE S.StudentNumber=R.StudentNumber
AND R.CourseNumber=C.CourseNumber
GROUP BY C.CourseNumber
University Questions
1. Explain aggregate functions in SQL with example. AU : May 18, Marks 13
2. Write DDL, DML,DCL commands for the students database. AU : Dec 17, Marks 7
5. Explain the six clauses in the syntax of SQL query and show what type of constructs can be specified in
each of the six clauses. Which of the six clauses are required and which are optional.
AU : Dec 15, Marks 16
6. Explain- DDL and DML AU : Dec 14, Marks 8
The high level languages which supports embedding SQLs within it are also known as
host language.
Database Management Systems 1 - 64 Relational Databases
query_error:
printf ("SQL error: %ld\n", sqlca->sqlcode);
exit();
bad_number:
printf ("Invalid order number.\n");
exit();
}
Database Management Systems 1 - 65 Relational Databases
University Questions
1. What is the need of embedded SQL. AU : May 17, Dec 17, Marks 2
2. What is embedded SQL ? Give an example AU : Dec 16, Marks 5, May-14, Dec 14, Marks 8
Dynamic SQL is a programming technique which allows to build the SQL statements
dynamically at runtime.
Dynamic SQL statements are not embedded in the source program but stored as strings
of characters that are manipulated during a program's runtime.
These SQL statements are either entered by a programmer or automatically generated
by the program.
Dynamic SQL statements also may change from one execution to the next without
manual intervention.
Dynamic SQL facilitates automatic generation and manipulation of program modules
for efficient automated repeating task preparation and performance.
Dynamic SQL facilitates the development of powerful applications with the ability to
create database objects for manipulation according to user input.
The simplest way to execute a dynamic SQL statement is with an EXECUTE
IMMEDIATE statement. This statement passes the SQL statement to the DBMS for
compilation and execution.
Example 1.15.1 Consider the relation student(Reg.No.,name,mark, and grade). Write
embedded dynamic SQL program in C language to retrieve all the students’ records whose
mark is more than 90. AU : May 17, Marks 11, Dec 17, Marks 6
Solution :
int main() {
/* Begin program */
EXEC SQL INCLUDE SQLCA;
Database Management Systems 1 - 66 Relational Databases
Ans. :
A Database Management System (DBMS) is collection of interrelated data and various
programs that are used to handle the data.
The primary goal of DBMS is to provide a way to store and retrieve the required
information from the database in convenient and efficient manner.
In addition, the database systems must ensure the safety of information stored.
1) DBMS removes the data redundancy that means there is no duplication of data in
database.
2) DBMS allows to retrieve the desired data in required format.
3) Data can be isolated in separate tables for convenient and efficient use.
4) Data can be accessed efficiently using a simple query language.
Ans. : Data abstraction means retrieving only required amount of information of the
system and hiding background details.
Q.5 What are three levels of data abstraction ? AU : Dec 02, 04,May 14, Dec 17
1. Physical Level
2. Logical Level
3. View Level
Q.6 Is it possible for several attributes to have same domain ? Illustrate your answer
with suitable example AU : Dec 04, Dec 15
Ans. : A domain is the set of legal values that can be assigned to an attribute. Each
attribute in a database must have a well-defined domain; we can’t mix values from
different domains in the same attribute. Hence it is not possible for several attributes to
have same domain.
For example - Student domain has attributes RollNo, Name, Address. Similarly
Employee domain has EmpID, Ename,Salary,Address. We can not define the same
domain for defining several attributes.
Q.7 Write the characteristic that distinguish the database approach with File based
approach AU : May 15, Dec 16
OR What are main differences between file processing system and a DBMS ?
AU : May 06, Dec 06
Ans. :
It is a collection of conceptual tools for describing data, relationships among data,
semantics (meaning) of data and constraints.
Data model is a structure below the database.
Q.12 What is data definition language ? Give example AU : Dec 16, May 18
Ans. :
Data Definition Language (DDL) is a specialized language used to specify a database
schema by a set of definitions.
It is a language which is used for creating and modifying the structures of tables,
views, indexes and so on.
Some of the common commands used in DDL are -CREATE, ALTER, DROP.
Ans. : DCL stands for Data Control Language. It includes commands such as GRANT
and REVOKE which mainly deals with the rights, permissions and other controls of
the database system.
Q.15 Why does SQL allow duplicate tuples in a table or in a query result ? AU : Dec 15
Ans. :
Data can be the same. Two people may have the same name. Since SQL is a database
where you store your data and data can be duplicate.
Database Management Systems 1 - 69 Relational Databases
But we can apply primary key constraints, Unique constraints or Distinct keyword to
identify the record uniquely
Q.16 Why key is essential? Write the different types of keys AU : Dec 04
Ans. :
Keys are used to specify the tuples distinctly in the given relation.
Various types of keys used in relational model are – Superkey, Candidate Keys,
primary keys, foreign keys.
Ans. :
The primary key is a candidate key chosen by the database designer to identify the
tuple in the relation uniquely.
For example – Consider a Student database as Student (RollNo,Name,Address). The
primary key for this database is RollNo.The primary is underlined.
Ans. :
Foreign key is a single attribute or collection of attributes in one table that refers to the
primary key of other table.
For example - Consider a Student database as Student (RollNo,Name,Address) and
Course(CourseId, CourseName, RollNo). Here RollNo is a foreign key
Q.19 What is the difference between primary key and foreign key ? AU : Dec 05
Ans. :
Primary Key Foreign Key
Primary key is a column or a set of Foreign key is a column or a set of
columns that can be used to uniquely columns that refer to a primary key or a
identify a row in a table candidate key of another table.
A table can have a single primary key, A table can have multiple foreign keys
that can reference different tables.
Ans. :
The referential integrity rule states that “whenever a foreign key value is used it
must reference a valid, existing primary key in the parent table”.
Database Management Systems 1 - 70 Relational Databases
Example : Consider the situation where you have two tables : Employees and
Managers. The Employees table has a foreign key attribute entitled ManagedBy,
which points to the record for each employee’s manager in the Managers table.
Ans. : Domain integrity ensures that all the data items in a column fall within a
defined set of valid values. Each column in a table has a defined set of values, such as the
set of all numbers for zip (five-digit), the set of all character strings for name.
Q.22 What are different types of integrity constraints used in designing relational
databases
AU : Dec 07
Q.23 List the reasons why null value might be introduced into the database AU : May 06
Ans. : NULL is a special value provided by database in two cases – i) When field
values of some tuples are unknown(For e.g. city name is not assigned) and
ii) inapplicable(For e.g. middle name is not present).
Q.25 Describe briefly any two undesirable properties that a database design may have ?
AU : Dec 02
Ans. : The two undesirable properties that a database design may have –
Q.26 Specify with suitable examples, the different types of keys used in database
management system. AU : Dec 02
Ans. : Data independence is an ability by which one can change the data at one level
without affecting the data at another level. Here level can be physical, conceptual or
external.
Q.29 What is meant by instance and Schema of the database AU : May 04, Dec 05
Ans. :
When information is inserted or deleted from the database then the database gets
changed. The collection of information at particular moment is called instances.
The overall design of the database is called schema
Ans:
Sr.No. Static SQL Dynamic SQL
Database Management Systems 1 - 72 Relational Databases
Notes
UNIT - II
Syllabus
Entity-Relationship model - E-R Diagrams - Enhanced-ER Model - ER-to-Relational Mapping -
Functional Dependencies - Non-loss Decomposition - First, Second, Third Normal Forms,
Dependency Preservation - Boyce/Codd Normal Form - Multi-valued Dependencies and Fourth
Normal Form - Join Dependencies and Fifth Normal Form.
Contents
2.1 Introduction to Entity Relationship Model
2.2 Mapping Cardinality
2.3 ER Diagrams
2.4 Enhanced ER Model
2.5 Examples based on ER Diagram
2.6 ER to Relational Mapping ................................. May-17, .............................. Marks 13
2.7 Concept of Relational Database Design
2.8 Functional Dependencies
2.9 Concept of Redundancy and Anomalies
2.10 Decomposition ................................................... Dec.-17, ................................ Marks 7
2.11 Normal Forms ........................................................ Dec.-14, 15, May-18 ........... Marks 16
2.12 Boyce / Codd Normal Form (BCNF)
2.13 Multivalued Dependencies and Fourth Normal Form May-14, Dec.-16 ................ Marks 16
2.14 Join Dependencies and Fifth Normal Form
2.15 Two Marks Questions with Answers
(2 - 1)
Database Management Systems 2-2 Database Design
2.1.2 ER Model
The ER data model specifies enterprise schema that represents the overall logical
structure of a database.
The E-R model is very useful in mapping the meanings and interactions of real-world
entities onto a conceptual schema.
The ER model consists of three basic concepts –
1) Entity Sets
Entity : An entity is an object that exists and is distinguishable from other objects.
For example - Student named “Poonam” is an entity and can be identified by her
name. The entity can be concrete or abstract. The concrete entity can be - Person,
Book, Bank. The abstract entity can be like - holiday, concept entity is represented
as a box.
Student Employee Department
Entity set : The entity set is a set of entities of the same types. For example - All
students studying in class X of the School. The entity set need not be disjoint. Each
entity in entity set have the same set of attributes and the set of attributes will
Database Management Systems 2-4 Database Design
distinguish it from other entity sets. No other entity set will have exactly the same
set of attributes.
2) Relationship Sets
Relationship is an association among two or more entities.
The relationship set is a collection of similar relationships. For example - Following
Fig. 2.1.2 shows the relationship works_for for the two entities Employee and
Departments.
The association between entity sets is called as participation. that is, the entity sets E1,
E2, . . . , En participate in relationship set R.
The function that an entity plays in a relationship is called that entity’s role.
3) Attributes
Attributes define the properties of a data object of entity. For example if student is an
entity, his ID, name, address, date of birth, class are its attributes. The attributes help
in determining the unique entity. Refer Fig. 2.1.3 for Student entity set with attributes
- ID, name, address. Note that entity is shown by rectangular box and attributes are
shown in oval. The primary key is underlined.
Types of Attributes
3) Derived attribute :
Derived attributes are the attributes that contain values that are calculated from other
attributes. To represent derived attribute there is dotted ellipse inside the solid ellipse. For
example –Age can be derived from attribute DateOfBirth. In this situation, DateOfBirth
might be called Stored Attribute.
Fig. 2.1.4
Database Management Systems 2-6 Database Design
2.3 ER Diagrams
An E-R diagram can express the overall logical structure of a database graphically.E-R
diagrams are used to model real-world objects like a person, a car, a company and the
relation between these real-world objects.
Features of ER model
i) E-R diagrams are used to represent E-R model in a database, which makes them
easy to be converted into relations (tables).
ii) E-R diagrams provide the purpose of real-world modeling of objects which makes
them intently useful.
iii) E-R diagrams require no technical knowledge and no hardware support.
iv) These diagrams are very easy to understand and easy to create even by a naive user.
v) It gives a standard solution of visualizing the data logically.
Relationship : Rhombus is
used to setup relationships
between two or more
entities.
Derived attribute :
Derived attributes are
those which are derived
based on other attributes,
for example, age can be
derived from date of birth.
To represent a derived
attribute, another dotted
ellipse is created inside the
main ellipse
Multivalued attribute : An
attribute that can hold
multiple values is known
as multivalued attribute.
We represent it with
double ellipses in an E-R
Diagram. E.g. A person can
have more than one phone
numbers so the phone
number attribute is
multivalued.
Database Management Systems 2-9 Database Design
ii) One to many : When entity A is associated with more than one entities at a time
then there is one to many relation. For example - One customer places order at a
time.
ii) Many to one : When more than one entities are associated with only one entity then
there is is many to one relation. For example - Many student take a
ComputerSciCourse.
iii) Many to many : When more than one entities are associated with more than one
entities. For example -Many teachers can teach many students.
Now consider the CUSTOMER entity, and that the customer buys products. If all
customers pay the same price for a product, regardless of supplier, then you have
a simple binary relationship between CUSTOMER and PRODUCT. For the
CUSTOMER/ PRODUCT relationship, the intersection attribute is retail_price.
Single ternary relation : Now consider a different scenario. Suppose the customer
buys products but the price depends not only on the product, but also on the
supplier. Suppose you needed a customerID, a productID, and a supplierID to
identify a price. Now you have an attribute that depends on three things and
hence you have a relationship between three entities (a ternary relationship) that
will have the intersection attribute, price.
Database Management Systems 2 - 12 Database Design
The entity set that has primary key is called as strong entity set
4. The member of strong entity set is The member of weak entity set is called
called as dominant entity set subordinate entity set.
6. The primary key is one of the The primary key of weak entity set is a
attributes which uniquely identifies combination of partial key and primary
its member. key of the strong entity set.
For example – There can be two subclass entities namely Hourly_Emps and
Contract_Emps which are subclasses of Empoyee class. We might have attributes
hours_worked and hourly_wage defined for Hourly_Emps and an attribute
contractid defined for ContractEmps.
Therefore, the attributes defined for an Hourly_Emps entity are the attributes for
Employees plus Hourly_Emps. We say that the attributes for the entity set
Employees are inherited by the entity set Hourly_Emps and that Hourly-Emps
ISA (read is a) Employees. It can be represented by following Fig. 2.4.1.
Fig. 2.4.1
3) Overlapping : When some entity can be a member of more than one subclasses. For
example - Person can be both a Student or a Staff. The And can be used to represent
this constraint.
2.4.3 Aggregation
A feature of the entity relationship model that allows a relationship set to participate in
another relationship set. This is indicated on an ER diagram by drawing a dashed box
around the aggregation.
For example - We treat the relationship set work and the entity sets employee and
project as a higher-level entity set called work.
OR Write short notes on : E-R diagram for banking system . AU : Dec.-14, Marks 8
Database Management Systems 2 - 17 Database Design
Solution :
Example 2.5.2 Consider the relation schema given in Figure. Design and draw an ER
diagram that capture the information of this schema. AU : May-17, Marks 5
Employee(empno,name,office,age)
Books(isbn,title,authors,publisher)
Loan(empno,isbn,date)
Database Management Systems 2 - 18 Database Design
Solution :
Example 2.5.3 Construct an E-R diagram for a car insurance company whose customers own
one or more cars each.Each car has associated with it zero to any number of recorded
accidents. Each insurance policy covers one or more cars and has one or more premium
payments associated with it. Each payment is for particular period of time and has an
associated due date and date when the payment was received. AU : Dec.-16, Marks 7
Solution :
Example 2.5.4 A car rental company maintains a database for all vehicles in its current fleet.
For all vehicles, it includes the vehicle identification number license number, manufacturer,
model, date of purchase and color. Special data are included for certain types of vehicles.
Database Management Systems 2 - 19 Database Design
Solution :
Example 2.5.5 Draw E-R diagram for the "Restaurant Menu Ordering System", which will
facilitate the food items ordering and services within a restaurant. The entire restaurant
scenario is detailed as follows. The customer is able to view the food items menu, call the
waiter, place orders and obtain the final bill through the computer kept in their table. The
Waiters through their wireless tablet PC are able to initialize a table for customers, control
the table functions to assist customers, orders, send orders to food preparation staff (chef)
and finalize the customer's bill. The Food preparation staffs (chefs), with their touch-display
interfaces to the system, are able to view orders sent to the kitchen by waiters. During
preparation they are able to let the waiter know the status of each item, and can send
notifications when items are completed. The system should have full accountability and
logging facilities, and should support supervisor actions to account for exceptional
Database Management Systems 2 - 20 Database Design
circumstances, such as a meal being refunded or walked out on. AU : May-15, Marks 16
Solution :
Example 2.5.6 A university registrar’s office maintains data about the following entities :
(1) courses, including number, title, credits, syllabus, and prerequisites;
(2) course offerings, including course number, year, semester, section number,
instructor(s), timings, and classroom;
(3) students, including student-id, name, and program; and
(4) instructors, including identification number, name, department, and title.
Further, the enrollment of students in courses and grades awarded to students in each
course they are enrolled for must be appropriately modeled. Construct an E-R diagram for
the registrar’s office. Document all assumptions that you make about the mapping
constraints.
AU : Dec.-13, Marks 10
Database Management Systems 2 - 21 Database Design
Solution :
The above ER model contains the redundant information, because every Employee,
Project, Machinery combination in works_on relationship is also considered in manages
Database Management Systems 2 - 22 Database Design
We can then create a binary relationship manages for between Manager and
(Employee, Project, Machinery).
Example 2.5.8 Construct an E-R diagram for a hospital with a set of patients and a set of
medical doctors. Associate with each patient a log of the various tests and examinations
conducted. AU : Dec.-07, Marks 8
Solution :
Database Management Systems 2 - 23 Database Design
In this section we will discuss how to map various ER model constructs to Relational
Model construct.
The SQL statement captures the information for above ER diageam as follows -
Declare foreign key constraints for all these fields from the entity sets.
For example - Consider following ER model
The SQL statement captures the information for relationship present in above ER
diagram as follows -
o By this approach the relationship associated with more than one entities is
separately represented using a table. For example - Consider following ER
diagram. Each Dept has at most one manager, according to the key
constraint on Manages.
Database Management Systems 2 - 25 Database Design
Here the constraint is each department has at the most one manager to manage it.
Hence no two tuples can have same DeptID. Hence there can be a separate table
named Manages with DeptID as Primary Key. The table can be defined using
following SQL statement
Approach 2 :
Method 1 : All the entities in the relationship are mapped to individual tables
InventoryItem(ID , name)
Book(ID,Publisher)
DVD(ID, Manufacturer)
Method 2 : Only subclasses are mapped to tables. The attributes in the superclass
are duplicated in all subclasses. For example -
Book(ID,name,Publisher)
DVD(ID, name,Manufacturer)
Method 3 : Only the superclass is mapped to a table. The attributes in the subclasses
are taken to the superclass. For example -
InventoryItem(ID , name,Publisher,Manufacturer)
This method will introduce null values. When we insert a Book record in the table, the
Manufacturer column value will be null. In the same way, when we insert a DVD record
in the table, the Publisher value will be null.
Database Management Systems 2 - 28 Database Design
Example 2.6.1 Construct an E-R diagram for a hospital with a set of patients and a set of
medical doctors. Associate with each patient a log of the various tests and examinations
conducted. Also construct appropriate tables for the ER diagram you have drawn.
Solution :
ER Diagram - Refer example 2.5.8.
Relational Mapping
University Question
1. Discuss the correspondence between the ER model construct and the relational model constructs.
Show how each ER model construct can be mapped to the relational model. Discuss the option for
mapping EER construct. AU : May-17, Marks 13
R N
1 AAA
2 BBB
3 CCC
4 DDD
5 EEE
Here, R->N is true. That means the functional dependency holds true here. Because for
every assigned RollNuumber of student there will be unique name. For instance : The
name of the Student whose RollNo is 1 is AAA. But if we get two different names for the
same roll number then that means the table does not hold the functional dependency.
Following is such table -
R N
1 AAA
2 BBB
3 CCC
1 XXX
2 YYY
In above table for RollNumber 1 we are getting two different names - “AAA” and
“XXX”. Hence here it does not hold the functional dependency.
B->D
B->E
Step 2 : Find the redundant entries and delete them. This can be done as follows -
A->CD
B->AE
This is a minimal cover or Canonical cover of functional dependencies.
2.9 Concept of Redundancy and Anomalies
Definition : Redundancy is a condition created in database in which same piece of
data is held at two different places.
Redundancy is at the root of several problems associated with relational schemas.
Problems caused by redundancy : Following problems can be caused by redundancy-
i) Redundant storage : Some information is stored repeatedly.
ii) Update anomalies : If one copy of such repeated data is updated then inconsistency
is created unless all other copies are similarly updated.
iii) Insertion anomalies : Due to insertion of new record repeated information get
added to the relation schema.
iv) Deletion anomalies : Due to deletion of particular record some other important
information associated with the deleted record get deleted and thus we may lose
some other important information from the schema.
Example : Following example illustrates the above discussed anomalies or redundancy
problems
Consider following Schema in which all possible information about Employee is
stored.
Database Management Systems 2 - 35 Database Design
1) Redundant storage : Note that the information about DeptID, DeptName and
DeptLoc is repeated.
2) Update anomalies : In above table if we change DeptLoc of Pune to Chennai, then
it will result inconsistency as for DeptID 101 the DeptLoc is Pune. Or otherwise, we
need to update multiple copies of DeptLoc from Pune to Chennai. Hence this is an
update anomaly.
3) Insertion anomalies : For above table if we want to add new tuple say
(5, EEE,50000) for DeptID 101 then it will cause repeated information of
(101, XYZ,Pune) will occur.
4) Deletion anomalies : For above table, if we delete a record for EmpID 4, then
automatically information about the DeptID 102,DeptName PQR and DeptLoc
Mumbai will get deleted and one may not be aware about DeptID 102. This causes
deletion anomaly.
2.10 Decomposition AU : Dec.-17, Marks 7
Decomposition is the process of breaking down one table into multiple tables.
Formal definition of decomposition is -
A decomposition of relation Schema R consists of replacing the relation Schema by
two relation schema that each contain a subset of attributes of R and together
include all attributes of R by storing projections of the instance.
For example - Consider the following table
Employee_Department table as follows -
Employee Table
Eid Ename Age City Salary
E001 ABC 29 Pune 20000
E002 PQR 30 Pune 30000
E003 LMN 25 Mumbai 5000
Database Management Systems 2 - 36 Database Design
Department Table
Deptid Eid DeptName
D001 E001 Finance
D002 E002 Production
D003 E003 Sales
D004 E004 Marketing
D005 E005 Human Resource
The decomposition is used for eliminating redundancy.
For example : Consider following relation Schema R in which we assume that the
grade determines the salary, the redundancy is caused
Schema R
Hence, the above table can be decomposed into two Schema S and T as follows :
Schema S Schema T
Name eid deptname Grade Grade Salary
AAA 121 Accounts 2 2 8000
AAA 132 Sales 3 3 7000
BBB 101 Marketing 4 4 7000
CCC 106 Purchase 2 2 8000
3) Checking some dependencies may require joining the instances of the decomposed
relations.
4) There may be loss of information during decomposition.
Example 2.10.2 Consider the following relation R(A,B,C,D,E,F) and FDs A->BC, C->A,
D->E, F->A, E->D is the decomposition of R into R1(A,C,D), R2(B,C,D), and R3(E,F,D).
Check for lossless.
Solution :
Step 1 : R1 R2 R3=R. Here the first condition for checking lossless join is satisfied
as (A,C,D)∪ (B,C,D) ∪ (E,F,D)={A,B,C,D,E,F} which is nothing but R.
Step 2 : Consider R1∩ R2={CD} and R2∩R3={D}. Hence second condition of
intersection not being gets satisfied.
Step 3 : Now, consider R1(A,C,D) and R2(B,C,D). We find R1∩R2={CD}
(CD)+ = {ABCDE} attributes of R1 i.e.{A,C,D}. Hence condition 3 for checking
lossless join for R1 and R2 gets satisfied.
Step 4 : Now, consider R2(B,C,D) and R3(E,F,D) . We find R2∩R3={D}.
(D)+={D,E} which is neither complete set of attributes of R2 or R3.[Note that F is
missing for being attribute of R3].
Hence it is not lossless join decomposition. Or in other words we can say it is a
lossy decomposition.
Example 2.10.3 Suppose that we decompose schema R=(A,B,C,D,E) into (A,B,C) (C,D,E)
Show that it is not a lossless decomposition.
Solution :
Step 1 : Here we need to assume some data for the attributes A, B, C, D, and E.
Using this data we can represent the relation as follows –
Relation R
A B C D E
a 1 x p q
b 2 x r s
Relation R1 = (A,B,C)
A B C
a 1 x
b 2 x
Relation R2 = (C,D,E)
C D E
x p q
x r s
Database Management Systems 2 - 39 Database Design
Step 2 : Now we will join these tables using natural join, i.e. the join based on
common attribute C. We get R1 ⋈ R2 as
A B C D E
a 1 x p q
Here we get more rows or
a 1 x r s tuples than original
b 2 x p q relation R
b 2 x r s
Clearly R1 ⋈ R2 R. Hence it is not lossless decomposition.
Example 2.10.4 Consider the relation R (A, B, C) for functional dependency set {A -> B and
B -> C} which is decomposed into two relations R1 = (A, C) and R2 = (B, C). Then check if
this decomposition dependency preserving or not.
Solution : This can be solved in following steps :
Step 1 : For checking whether the decomposition is dependency preserving or not
we need to check
following condition
F+ = (F1 F2)+
Step 2 : We have with us the F+ ={ A->B and B->C }
+ +
Step 3 : Let us find (F1) for relation R1 and (F2) for relation R2
R1(A,C) R2(B,C)
A->A Trivial B->B Trivial
C->C Trivial C->C Trivial
A->C In (F)+A->B->C and it is Nontrivial B->C In (F)+ B->C and it is Non-Trivial
AC->AC Trivial BC->BC Trivial
A->B but is not useful as B is not part of R1 We can not obtain C->B
set
We can not obtain C->A
Database Management Systems 2 - 40 Database Design
Step 4 : We will eliminate all the trivial relations and useless relations. Hence we
can obtain R1 and R2 as
R1(A,C) R2(B,C)
A->C Nontrivial
B->C Non-Trivial
Example 2.10.5 Let relation R(A,B,C,D) be a relational schema with following functional
dependencies {A->B, B->C,C->D, and D->B}. The decomposition of R into (A,B), (B,C)
and (B,D). Check whether this decomposition is dependency preserving or not.
Solution :
Step 1 : Let (F)+ = {A->B, B->C, C->D,D->B}.
Step 2 : We will find (F1)+, (F2)+, (F3)+ for relations R1(A,B) , R2(B,C) and R3(B,D) as
follows -
Step 3 : We will eliminate all the trivial relations and useless relations. Hence we
can obtain R1 ∪ R2 ∪ R3 as
University Question
1. Differentiate between lossless join decomposition and dependency preserving decomposition.
AU : Dec.-17, Marks 7
As there are multiple values of phone number for sid 1 and 3, the above table is not in
1NF. We can make it in 1NF. The conversion is as follows -
For example : Consider following table in which every information about a the
Student is maintained in a table such as student id(sid), student name(sname), course
id(cid) and course name(cname).
Student_Course
sid sname cid cname
1 AAA 101 C
2 BBB 102 C++
3 CCC 101 C
4 DDD 103 Java
This table is not in 2NF. For converting above table to 2NF we must follow the
following steps -
Step 1 : The above table is in 1NF.
Step 2 : Here sname and sid are associated similarly cid and cname are associated
with each other. Now if we delete a record with sid=2, then automatically the
course C++ will also get deleted. Thus,
sid->sname or cid->cname is a partial functional dependency, because {sid,cid}
should be essentially a candidate key for above table. Hence to bring the above table
to 2NF we must decompose it as follows :
Student
Here candidate key is
sid sname cid (sid,cid)
and
1 AAA 101
(sid,cid)->sname
2 BBB 102
3 CCC 101
4 DDD 103
Course
cid cname
Here candidate key is
101 C cid
101 C
103 Java
101 1 AAA
102 2 BBB
103 3 CCC
104 4 DDD
Superkeys
{RegID}
{RegID, RollNo}
{RegID,Sname}
{RollNo,Sname}
{RegID, RollNo,Sname}
Candidate Keys
{RegID}
{RollNo}
Or in other words
In other words 3NF can be defined as : A table is in 3NF if it is in 2NF and for each
functional dependency
X-> Y
at least one of the following conditions hold :
i) X is a super key of table
ii) Y is a prime attribute of table
For example : Consider following table Student_details as follows -
Zip
zipcode cityname state
11111 Pune Maharashtra
22222 Surat Gujarat
33333 Chennai Tamilnadu
44444 Jaipur Rajasthan
55555 Mumbai Maharashtra
Similarly AB AF ∵ B F (given)
Thus now using union rule
AB ABCDEFGHIJ
AB is a key
The table can be converted to 2NF as
Database Management Systems 2 - 47 Database Design
R1 = (A, B, C)
R2 = (A, D, E, I, J)
R3 = (B, F, G, H)
R2 = (A, D, E)
R3 = (D, I, J)
R4 = (B, E)
R5 = (E, G, H).
University Questions
1. What is database normalization ? Explain the first normal form, second normal form and third
normal form. AU : May-18, Marks 13; Dec.-15, Marks 16
2. What are normal forms. Explain the types of normal form with an example.
AU : Dec.-14, Marks 16
Enrollment Table
sid course Teacher
1 C Ankita
1 Java Poonam
Database Management Systems 2 - 48 Database Design
2 C Ankita
3 C++ Supriya
4 C Archana
From above table following observations can be made :
One student can enroll for multiple courses. For example student with sid=1 can
enroll for C as well as Java.
For each course, a teacher is assigned to the student.
There can be multiple teachers teaching one course for example course C can be
taught by both the teachers namely - Ankita and Archana.
The candidate key for above table can be (sid,course), because using these two
columns we can find
The above table holds following dependencies
o (sid,course)->Teacher
o Teacher->course
The above table is not in BCNF because of the dependency teacher->course. Note
that the teacher is not a superkey or in other words, teacher is a non prime
attribute and course is a prime attribute and non-prime attribute derives the prime
attribute.
To convert the above table to BCNF we must decompose above table into Student
and Course tables
Student
sid Teacher
1 Ankita
1 Poonam
2 Ankita
3 Supriya
4 Archana
Course
Teacher course
Ankita C
Poonam Java
Ankita C
Supriya C++
Archana C
Now the table is in BCNF
Database Management Systems 2 - 49 Database Design
(AC) + = {AC} R
There is no involvement of D on LHS of the FD rules. Hence D can not be part of any
candidate key. Thus we obtain two candidate keys (AB)+ and (BC)+. Hence
prime attributes = {A,B,C}
Non prime attributes = {D}
Step 2 : Now, we will start checking from reverse manner, that means from BCNF,
then 3NF, then 2NF.
Step 3 : For R being in BCNF for X->Y the X should be candidate key or super key.
From above FDs consider C->D in which C is not a candidate key or super key.
Hence given relation is not in BCNF.
Step 4 : For R being in 3NF for X->Y either i) the X should be candidate key or super
key or ii) Y should be prime attribute. (For prime and non prime attributes refer
step 1)
o Consider C->A. In this FD the C is not candidate key. Condition for 2NF is
satisfied.
o Now consider B->D. In this FD, the B is a part of candidate key(AB or BC),
similarly D is not a prime attribute. That means partial functional
dependency occurs here. Hence condition for 2NF fails over here.
Hence given relation is not in 2NF.
Therefore we can conclude that the given relation R is in 1NF.
Example 2.12.2 Consider a relation R(ABC) with following FD A->B, B->C and C->A.
What is the normal form of R ?
Solution :
Step 1 : We will find the candidate key
(A)+ = {ABC} =R
(B)+ = {ABC} =R
(C)+ = {ABC} =R
Hence A, B and C all are candidate keys
Prime attributes = {A,B,C}
Non prime attribute{}
Step 2 : For R being in BCNF for X->Y the X should be candidate key or super key.
From above FDs
Student
sid Course Skill
1 C English
C++ German
2 Java English
French
Here sid =1 leads to multiple values for courses and skill. Following table shows this
Database Management Systems 2 - 52 Database Design
1 C English
1 C++ German
1 C German
1 C++ English
2 Java English
2 Java French
Here sid and course are dependent but the Course and Skill are independent. The
multivalued dependency is denoted as :
sid Course
sid Skill
Student Table
sid Course Skill
1 C English
1 C++ German
1 C German
1 C++ English
2 Java English
2 Java French
Now to convert the above table to 4NF we must decompose the table into following
two tables.
Database Management Systems 2 - 53 Database Design
Student_Course Table
Key : (sid,Course)
sid Course
1 C
1 C++
2 Java
Student_Skill Table
Key : (sid,Skill)
sid Skill
1 English
1 German
2 English
2 French
Thus the tables are now in 4NF.
University Questions
1. Explain first normal form, second normal form, third normal form and BCNF with example.
AU : Dec.-16, Marks 13
2. Explain Boyce Codd Normal form and fourth normal form with suitable example.
AU : May-14, Marks 16
The above table is in 4th Normal Form as there is no multivalued dependency. But it
is not in 5th normal form because if we join the above two table we may get
To avoid the above problem we can decompose the tables into three tables as
Seller_Company, Seller_Product, and Company Product table
Seller_Company Seller_Product Company_Product
Seller Company Seller Product Company Product
Rupali Godrej Rupali Cinthol Godrej Cinthol
Sharda Dabur Sharda Honey Dabur Honey
Sunil Amul Sharda HairOil Dabur HairOil
Sunil Britania Sharda RoseWater Dabur RoseWater
Sunil Icecream Amul Icecream
Sunil Biscuit Britania Biscuit
Q.2 Give the limitations of E-R model ? How do you overcome this ? AU : May-07
Ans. : 1) Loss of information content : Some information be lost or hidden in ER
model
2) Limited relationship representation : ER model represents limited relationship as
compared to another data models like relational model etc.
3) No representation of data manipulation : It is difficult to show data manipulation
in ER model.
4) Popular for high level design : ER model is very popular for designing high level
design.
Q.9 Why certain functional dependencies are called trivial functional dependencies ?
AU : May-06,12
Ans. : A functional dependency FD : X → Y is called trivial if Y is a subset of X.
This kind of dependency is called trivial because it can be derived from common
sense. If one "side" is a subset of the other, it's considered trivial. The left side is
considered the determinant and the right the dependent.
For example - {A,B} –> B is a trivial functional dependency because B is a subset of
A,B. Since {A,B} –> B includes B, the value of B can be determined. It's a trivial
functional dependency because determining B is satisfied by its relationship to
A,B
Q.13 Describe BCNF and describe a relation which is in BCNF. AU : Dec. -02
Ans. : Refer section 2.12.
Q.14 Why 4NF in normal form is more desirable than BCNF ? AU : Dec. -14
Ans. :
4NF is more desirable than BCNF because it reduces the repetition of information.
If we consider a BCNF schema not in 4NF we observe that decomposition into
4NF does not lose information provided that a lossless join decomposition is used,
yet redundancy is reduced.
Q.15 Give an example of a relation schema R and set of dependencies such that R is in
BCNF but not in 4NF. AU : May -12
Ans. : Consider relation R(A,B,C,D) with dependencies
AB C
ABC D
AC B
Here the only key is AB. Thus each functional dependency has superkey on the left.
But MVD has non-superky on its left. So it is not 4NF.
Ans. : Decomposition is the process of breaking down one table into multiple
tables.
The decomposition is used for eliminating redundancy.
Database Management Systems 2 - 60 Database Design
Notes
UNIT - III
Syllabus
Transaction Concepts - ACID Properties - Schedules - Serializability - Concurrency Control - Need
for Concurrency - Locking Protocols - Two Phase Locking - Deadlock - Transaction Recovery -
Save Points - Isolation Levels - SQL Facilities for Concurrency and Recovery.
Contents
3.1 Transaction Concepts ................................................. Dec.-14 ................................. Marks 4
3.2 ACID Properties ........................................................... May-14, 18 ........................... Marks 8
3.3 Transaction States ....................................................... Dec.-11, May-14,18 ............. Marks 8
3.4 Schedules
3.5 Serializability ................................................................ Dec.-15, May-15,18 ............. Marks 8
3.6 Transaction Isolation and Atomicity
3.7 Introduction to Concurrency Control
3.8 Need for Concurrency ................................................. May-17 ............................... Marks 13
3.9 Locking Protocols ......................................................... Dec-15,17, May-16 ............ Marks 16
3.10 Two Phase Locking..................................................... May-14,18, Dec.-16 ............. Marks 7
3.11 Time Stamp Based Protocol
3.12 Dead Lock ................................................................... Dec-14,15,16, May-09 ....... Marks 16
3.13 Transaction Recovery
3.14 Save Points
3.15 Isolation Levels
3.16 SQL Facilities for Concurrency and Recovery ............ May-14 ................................. Marks 8
3.17 Two Marks Questions with Answers
(3 - 1)
Database Management Systems 3-2 Transactions
University Question
1) Atomicity :
This property states that each transaction must be considered as a single unit and
must be completed fully or not completed at all.
No transaction in the database is left half completed.
Database should be in a state either before the transaction execution or after the
transaction execution. It should not be in a state ‘executing’.
For example - In above mentioned withdrawal of money transaction all the five
steps must be completed fully or none of the step is completed. Suppose if
transaction gets failed after step 3, then the customer will get the money but the
balance will not be updated accordingly. The state of database should be either at
before ATM withdrawal (i.e customer without withdrawn money) or after ATM
withdrawal (i.e. customer with money and account updated). This will make the
system in consistent state.
Database Management Systems 3-3 Transactions
2) Consistency :
The database must remain in consistent state after performing any transaction.
For example : In ATM withdrawal operation, the balance must be updated
appropriately after performing transaction. Thus the database can be in consistent
state.
3) Isolation :
In a database system where more than one transaction are being executed
simultaneously and in parallel, the property of isolation states that all the
transactions will be carried out and executed as if it is the only transaction in the
system.
No transaction will affect the existence of any other transaction.
For example : If a bank manager is checking the account balance of particular
customer, then manager should see the balance either before withdrawing the
money or after withdrawing the money. This will make sure that each individual
transaction is completed and any other dependent transaction will get the
consistent data out of it. Any failure to any transaction will not affect other
transaction in this case. Hence it makes all the transactions consistent.
4) Durability :
The database should be strong enough to handle any system failure.
If there is any set of insert /update, then it should be able to handle and commit to
the database.
If there is any failure, the database should be able to recover it to the consistent
state.
For example : In ATM withdrawal example, if the system failure happens after
Customer getting the money then the system should be strong enough to update
Database with his new balance, after system recovers. For that purpose the system
has to keep the log of each transaction and its failure. So when the system
recovers, it should be able to know when a system has failed and if there is any
pending transaction, then it should be updated to Database.
University Questions
1. Explain with an example the properties that must be satisfied by transaction
AU : May-18, Marks 7
Solution :
(1) Read only transaction
T1
Read(A)
Read(B)
Display(A-B)
(3)
T1 T2
Read(A) Assume A=100
A=A+50 A=150
Write(A)
Read(A) A=150
A=A+100 A=250
University Questions
1. During execution, a transaction passes through several states, until it finally commits or aborts.
List all possible sequences of states through which transaction may pass. Explain why each state
transaction may occur ? AU : May-18, Marks 6
3.4 Schedules
Schedule is an order of multiple transactions executing in concurrent environment.
Following figure represents the types of schedules.
Serial Schedule : The schedule in which the transactions execute one after the other is
called serial schedule. It is consistent in nature. For example : Consider following two
transactions T1 and T2
Database Management Systems 3-6 Transactions
T1 T2
R(A)
W(A)
R(B)
W(B)
R(A)
W(A)
R(B)
W(B)
All the operations of transaction T1 on data items A and then B executes and then in
transaction T2 all the operations on data items A and B execute. The R stands for Read
operation and W stands for write operation.
Non Serial Schedule : The schedule in which operations present within the transaction
are intermixed. This may lead to conflicts in the result or inconsistency in the resultant
data. For example-
Consider following two transactions,
T1 T2
R(A)
W(A)
R(A)
W(B)
R(A)
W(B)
R(B)
W(B)
The above transaction is said to be non serial which result in inconsistency or conflicts
in the data.
3.5 Serializability AU : Dec.-15, May-15, 18, Marks 8
For example :
T1 A B T2
W(A)
B=B+10
W(B)
90 110
A=A-10
W(A)
80 110
In above transactions initially T1 will read the values from database as A=100,
B=100 and modify the values of A and B. But transaction T2 will read the modified
value i.e. 90 and will modify it to 80 and perform write operation. Thus at the end
of transaction T1 value of A will be 90 but at end of transaction T2 value of A will
be 80. Thus conflicts or inconsistency occurs here. This sequence can be converted
to a sequence which may give us consistent result. This process is called
serializability.
Difference between Serial Schedule and Serializable Schedule
Serial Schedule Serializable Schedule
No concurrency is allowed in serial schedule. Concurrency is allowed in serializable schedule.
In serial schedule, if there are two transactions In serializable schedule, if there are two
executing at the same time and no interleaving of transactions executing at the same time and
operations is permitted, then following can be the interleaving of operations is allowed there can be
possibilities of execution – different possible orders of executing an
(i) Execute all the operations of transactions T1 in individual operation of the transactions.
a sequence and then execute all the operations of
transactions T2 in a sequence.
(ii) Execute all the operations of transactions T2 in
a sequence and then execute all the operations of
transactions T1 in a sequence.
Database Management Systems 3-8 Transactions
T1 T2 T1 T2
Read(A) Read(A)
A=A-50 A=A-50
Write(A) Write(A)
Read(B) Read(B)
B=B+100 B=B+100
Write(B) Write(B)
Read(A) Read(B)
A=A+10 Write(B)
Write(A)
T1 T2
R(A)
W(A)
R(A)
R(B)
R(B)
W(B)
Solution :
Step 1 : To check whether the schedule is conflict serializable or not we will check from
top to bottom. Thus we will start reading from top to bottom as
T1: R(A) -> T1:W(A) ->T2:R(A) -> T2:R(B) ->T1:R(B)->T1:W(B)
Database Management Systems 3 - 10 Transactions
Step 2 : We will find conflicting operations. Two operations are called as conflicting
operations if all the following conditions hold true for them-
i) Both the operations belong to different transactions.
ii) Both the operations are on same data item.
iii) At least one of the two operations is a write operation
From above given example in the top to bottom scanning we find the conflict as
T1:W(A) ->T2:R(A).
i) Here note that there are two different transactions T1 and T2,
Step 4 : Draw the edge between conflicting transactions. For example in above given
scenario, the conflict occurs while moving from T1:W(A) to T2:R(A). Hence edge must be
from T1 to T2.
Step 5 : Repeat the step 4 while reading from top to bottom. Finally the precedence
graph will be as follows
Step 6 : Check if any cycle exists in the graph. Cycle is a path using which we can start
from one node and reach to the same node. If the is cycle found then schedule is not
conflict serializable. In the step 5 we get a graph with cycle, that means given schedule is
not conflict serializable.
Example 3.5.2 Check whether following schedule is conflict serializable or not. If it is not
conflict serializable then find the serializability order.
Database Management Systems 3 - 11 Transactions
T1 T2 T3
R(A)
R(B)
R(B)
W(B)
W(A)
W(A)
R(A)
W(A)
Solution :
Step 1 : We will read from top to bottom, and build a precedence graph for conflicting
entries :
Step 2 : As there is no cycle in the precedence graph, the given sequence is conflict
serializable. Hence we can convert this non serial schedule to serial schedule. For that
purpose we will follow these steps to find the serializable order.
Step 3 : A serializability order of the transactions can be obtained by finding a linear
order consistent with the partial order of the precedence graph. This process is called
topological sorting.
Step 4 : Find the vertex which has no incoming edge which is T1. Finally find the vertex
having no outgoing edge which is T2. So in between them is T3. Hence the order will be
T1 – T3 –T2
3.5.2 View Serializability
If a given schedule is found to be view equivalent to some serial schedule, then it is called
as a view serializable schedule.
View Equivalent Schedule : Consider two schedules S1 and S2 consisting of transactions
Database Management Systems 3 - 12 Transactions
T1 and T2 respectively, then schedules S1 and S2 are said to be view equivalent schedule if
it satisfies following three conditions :
o If transaction T1 reads a data item A from the database initially in schedule S1,
then in schedule S2 also, T1 must perform the initial read of the data item X
from the database. This is same for all the data items. In other words - the
initial reads must be same for all data items.
o If data item A has been updated at last by transaction Ti in schedule S1, then in
schedule S2 also, the data item A must be updated at last by transaction Ti.
o If transaction Ti reads a data item that has been updated by the transaction Tj
in schedule S1, then in schedule S2 also, transaction Ti must read the same data
item that has been updated by transaction Tj. In other words the Write-Read
sequence must be same.
T1 T2 T3
W(C)
R(A)
W(B)
R(C)
W(B)
W(B)
ii) The final write is performed by T1 on the same data item B. Hence T1 will be at the
last position.
iii) The data item C is written by T3 and then it is read by T1. Hence T3 should appear
before T1.
Thus we get the order of schedule of view serializability as T2 – T1 – T3
A B
0 0
0 1
0 1
AB =A OR B=FT=T. This means consistency is met.
Consider case T2->T1 then
A B
0 0
1 0
1 0
AB = A OR B = FT = T. This means consistency is met.
Database Management Systems 3 - 14 Transactions
(2) The concurrent execution means interleaving of transactions T1 and T2. It can be
T1 T2
R(A)
R(B)
R(A)
R(B) If B=0 then
If A=0 then A=A+1
B=B+1 W(A)
W(B)
This is a non-serializable schedule.
(3) There is no concurrent execution resulting in a serializable schedule.
Example 3.5.5 Test serializability of the following schedule :
i) r1(x);r3(x);w1(x);r2(x);w3(x) ii) r3(x);r2(x);w3(x);r1(x);w1(x)
Solution : i) r1(x);r3(x);w1(x);r2(x);w3(x)
The r1 represents the read operation of transaction T1, w3 represents the write operation
on transaction T3 and so on. Hence from given sequence the schedule for three
transactions can be represented as follows :
T1 T2 T3
r1(x)
r3(x)
w1(x)
r2(x)
w3(x)
Step 1 : We will use the precedence graph method to check the serializability. As there
are three transactions, three nodes are created for each transaction.
Step 2 : We will read from top to bottom. Initially we read r1(x) and keep on moving
bottom in search of write operation. Here all the transactions work on same data item i.e.
x. Now we get a write operation in T3 as w3(x). Hence the dependency is from T1 to T3.
Therefore we draw edge from T1 to T3.
Similarly, for r3(x) we get w1(x) pair. Hence there will be edge from T3 to T1. Continuing
in this fashion we get the precedence graph as
Step 3 : As cycle exists in the above precedence graph, we conclude that it is not
serializable.
ii) r3(x);r2(x);w3(x);r1(x);w1(x)
T1 T2 T3
r3(x)
r2(x)
w3(x)
r1(x)
w1(x)
Step 1 : Read the schedule from top to bottom for pair of operations. For r3(x) we get
w1(x) pair. Hence edge exists from T3 to T1 in precedence graph.
There is a pair from r2(x) : w3(x). Hence edge exists from T2 to T3.
There is a pair from r2(x) : w1(x). Hence edge exists from T2 to T1.
There is a pair from w3(x): r1(x). Hence edge exists from T3 to T1.
Database Management Systems 3 - 16 Transactions
Step 3 : As there is no cycle in the above graph, the given schedule is serializable.
Step 4 : The searializability order for consistent schedule will be obtained by applying
topological sorting on above drawn precedence graph. This can be achieved as follows,
Sub-Step 1 : Find the node having no incoming edge. We obtain T2 is such a node. Hence
T2 is at the beginning of the serializability sequence. Now delete T2. The Graph will be
Thus we obtain the sequence of transactions as T2, T3 and T1. Hence the serializability
order is
r2(x);r3(x);w3(x):r1(x);w1(x)
Example 3.5.6 Consider the following schedules. The actions are listed in the order they are
scheduled, and prefixed with the transaction name.
S1 : T1 : R(X), T2 : R(X), T1 : W(Y), T2 : W(Y) T1 : R(Y), T2 : R(Y)
S2 : T3 : W(X), T1 : R(X), T1 : W(Y), T2 : R(Z), T2 : W(Z) T3 : R(Z)
For each of the schedules, answer the following questions :
i) What is the precedence graph for the schedule ?
ii) Is the schedule conflict-serializable ? If so, what are all the conflict equivalent serial
schedules ?
iii) Is the schedule view-serializable ? If so, what are all the view equivalent serial schedules ?
AU : May-15, Marks 2 + 7 + 7
Database Management Systems 3 - 17 Transactions
Solution : i) We will find conflicting operations. Two operations are called as conflicting
operations if all the following conditions hold true for them-
o T2 : W(Y), T1 : R(Y)
Hence we will build the precedence graph. Draw the edge between conflicting
transactions. For example in above given scenario, the conflict occurs while moving from
T1:W(Y) to T2:W(Y). Hence edge must be from T1 to T2. Similarly for second conflict, there
will be the edge from T2 to T1
(iii)
o S1 is not view serializable.
T2-T3-T1.
Database Management Systems 3 - 18 Transactions
University Questions
1. Explain Conflict serializability and view serializability.
AU : May-18, Marks 6, Dec.-15, Marks 8
T1 T2
R(A)
A=A+50
W(A)
R(A)
A=A-20
W(A)
Commit
Failure
some transaction…
Commit
Database Management Systems 3 - 19 Transactions
The above schedule is inconsistent if failure occurs after the commit of T2.
It is because T2 is dependable transaction on T1. A transaction is said to be
dependable if it contains a dirty read.
The dirty read is a situation in which one transaction reads the data immediately
after the write operation of previous transaction
T1 T2
R(A)
A=A+50
W(A) Dirty read
R(A)
A=A-20
W(A)
Commit
Commit
Now if the dependable transaction i.e. T2 is committed first and then failure occurs
then if the transaction T1 makes any changes then those changes will not be
known to the T2. This leads to non recoverable state of the schedule.
To make the schedule recoverable we will apply the rule that - commit the
independent transaction before any dependable transaction.
In above example independent transaction is T1, hence we must commit it before
the dependable transaction i.e. T2.
R(A)
A=A+50
W(A)
R(A)
A=A-20
W(A)
Commit
Commit
Database Management Systems 3 - 20 Transactions
The cascadeless schedule allows only committed Read operation. For example :
T1 T2 T3
R(A)
A=A+50
W(A)
Commit
R(A)
A=A-20
W(A)
Commit
R(A)
W(A)
In above schedule at any point if the failure occurs due to commit operation before
every Read operation of each transaction, the schedule becomes recoverable and atomicity
can be maintained.
A database can have multiple transactions running at the same time. This is called
concurrency.
To preserve the isolation property, the system must control the interaction among
the concurrent transactions; this control is achieved through one of a variety of
mechanisms called concurrency control schemes.
Definition of concurrency control : A mechanism which ensures that
simultaneous execution of more than one transactions does not lead to any
database inconsistencies is called concurrency control mechanism.
The concurrency control can be achieved with the help of various protocols such
as - lock based protocol, Deadlock handling, Multiple Granularity, Timestamp
based protocol, and validation based protocols.
The result of the above sequence is that the update made by transaction T1 is
completely lost. Therefor this problem is called as lost update problem.
(2) Dirty Read or Uncommited Read Problem : The dirty read is a situation in which
one transaction reads the data immediately after the write operation of previous
transaction
T1 T2
R(A)
A=A+50
W(A) Dirty read
R(A)
A=A-20
W(A)
Commit
Commit
For example – Consider following transactions -
Assume initially salary is = ` 1000
Database Management Systems 3 - 23 Transactions
(1) At the time t1, the transaction T2 updates the salary to `1200
(2) This salary is read at time t2 by transaction T1. Obviously it is ` 1200
(3) But at the time t3, the transaction T2 performs Rollback by undoing the changes
made by T1 and T2 at time t1 and t2.
(4) Thus the salary again becomes = ` 1000. This situation leads to Dirty Read or
Uncommited Read because here the read made at time t2(immediately after
update of another transaction) becomes a dirty read.
The phantom read problem is a special case of non repeatable read problem.
This is a problem in which one of the transaction makes the changes in the database
system and due to these changes another transaction can not read the data item which
it has read just recently. For example –
(1) At time t1, the transaction T1 reads the value of salary as ` 1000
(2) At time t2, the transaction T2 reads the value of the same salary as ` 1000
(3) At time t3, the transaction T1 deletes the variable salary.
(4) Now at time t4, when T2 again reads the salary it gets error. Now transaction T2 can
not identify the reason why it is not getting the salary value which is read just few
time back.
This problem occurs due to changes in the database and is called phantom read
problem.
University Question
1. Discuss the violations caused by each of the following: dirty read, non repeatable read and phantoms
with suitable example AU : May-17, Marks 13
i) Shared Lock : The shared lock is used for reading data items only. It is denoted by
Lock-S. This is also called as read lock.
ii) Exclusive Lock : The exclusive lock is used for both read and write operations. It is
denoted as Lock-X. This is also called as write lock.
The compatibility matrix is used while working on set of locks. The concurrency
control manager checks the compatibility matrix before granting the lock. If the
two modes of transactions are compatible to each other then only the lock will be
granted.
In a set of locks may consists of shared or exclusive locks. Following matrix
represents the compatibility between modes of locks.
S X
S T F
X F F
Here T stands for True and F stands for False. If the control manager get the
compatibility mode as True then it grant the lock otherwise the lock will be denied.
For example : If the transaction T1 is holding a shared lock in data item A, then the
control manager can grant the shared lock to transaction T2 as compatibility is
True. But it cannot grant the exclusive lock as the compatibility is false. In simple
words if transaction T1 is reading a data item A then same data item A can be read
by another transaction T2 but cannot be written by another transaction.
Similarly if an exclusive lock (i.e. lock for read and write operations) is hold on the
data item in some transaction then no other transaction can acquire Shared or
exclusive lock as the compatibility function denotes F. That means of some
Database Management Systems 3 - 26 Transactions
transaction is writing a data item A then another transaction can not read or write
that data item A.
Hence the rule of thumb is
i) Any number of transactions can hold shared lock on an item.
ii) But exclusive lock can be hold by only one transaction.
University Questions
1. State and explain the lock based concurrency control with suitable example
AU : Dec-17, Marks 13, May-16, Marks 16
2. What is Concurrency control ? How is implemented in DBMS ? Illustrate with suitable example.
AU : Dec-15, Marks 8
The two phase locking is a protocol in which there are two phases :
i) Growing phase (Locking phase) : It is a phase in which the transaction may
obtain locks but does not release any lock.
Database Management Systems 3 - 27 Transactions
…
Unlock(A)
Unlock(B)
Unlock(C)
For example –
Consider following transactions
T1 T2
Lock-X(A) Lock-S(B)
Read(A) Read(B)
A=A-50 Unlock-S(B)
Write(A)
Lock-X(B)
Unlock-X(A)
B=B+100 Lock-S(A)
Write(B) Read(A)
Unlock-X(B) Unlock-S(A)
The important rule for being a two phase locking is - All Lock operations precede all
the unlock operations.
In above transactions T1 is in two phase locking mode but transaction T2 is not in two
phase locking. Because in T2, the Shared lock is acquired by data item B, then data item B
is read and then the lock is released. Again the lock is acquired by data item A , then the
data item A is read and the lock is then reloased. Thus we get lock-unlock-lock-unlock
sequence. Clearly this is not possible in two phase locking.
Database Management Systems 3 - 28 Transactions
The serializability using two phase locking can be understood with the help of
following example
Consider two transactions
T1 T2
R(A)
R(A)
R(B)
W(B)
Step 1 : Now we will apply two phase locking. That means we will apply locks in
growing and shrinking phase
T1 T2
Lock-S(A)
R(A)
Lock-S(A)
R(A)
Lock-X(B)
R(B)
W(B)
Unlock-X(B)
Unlock-S(A)
Database Management Systems 3 - 29 Transactions
The two phase locking protocol leads to two problems – deadlock and cascading
roll back.
(1) Deadlock : The deadlock problem can not be solved by two phase locking.
Deadlock is a situation in which when two or more transactions have got a lock and
waiting for another locks currently held by one of the other transactions.
For example
T1 T2
Lock-X(A) Lock-X(B)
Read(A) Read(B)
A=A-50 B=B+100
Write(A) Write(B)
Delayed, Delayed,
wait for T2 wait for T1
to release to release
Lock on B Lock on A
T1 T2 T3
Read(A)
Read(B)
C=A+B
Write(C)
Read(C)
Write(C)
Read(C)
When T1 writes value of C then only T2 can read it. And when T2 writes the value of C
then only transaction T3 can read it. But if the transaction T1 gets failed then
automatically transactions T2 and T3 gets failed.
The simple two phase locking does not solve the cascading rollback problem. To solve
the problem of cascading Rollback two types of two phase locking mechanisms can be
used.
3.10.1 Types of Two Phase Locking
(1) Strict Two Phase Locking : The strict 2PL protocol is a basic two phase protocol but
all the exclusive mode locks be held until the transaction commits. That means in other
words all the exclusive locks are unlocked only after the transaction is committed. That
also means that if T1 has exclusive lock, then T1 will release the exclusive lock only after
commit operation, then only other transaction is allowed to read or write. For example -
Consider two transactions
T1 T2
W(A)
R(A)
If we apply the locks then
T1 T2
Lock-X(A)
W(A)
Commit
Unlock(A)
Lock-S(A)
R(A)
Unlock-S(A)
Database Management Systems 3 - 31 Transactions
Thus only after commit operation in T1, we can unlock the exclusive lock. This ensures
the strict serializability.
Thus compared to basic two phase locking protocol, the advantage of strict 2PL
protocol is it ensures strict serializability.
(2) Rigorous Two Phase Locking : This is stricter two phase locking protocol. Here all
locks are to be held until the transaction commits. The transactions can be seriealized in
the order in which they commit.
example - Consider transactions
T1
R(A)
R(B)
W(B)
If we apply the locks then
T1
Lock-S(A)
R(A)
Lock-X(B)
R(B)
W(B)
Commit
Unlock(A)
Unlock(B)
Thus the above transaction uses rigorous two phase locking mechanism
Example 3.10.2 Consider the following two transactions :
T1:read(A)
Read(B);
If A=0 then B=B+1;
Write(B)
T2:read(B); read(A)
If B=0 then A=A+1
Write(A)
Add lock and unlock instructions to transactions T1 and T2, so that they observe two phase
locking protocol. Can the execution of these transactions result in deadlock ?
AU : Dec.-16, Marks 6
Database Management Systems 3 - 32 Transactions
Solution :
T1 T2
Lock-S(A) Lock-S(B)
Read(A) Read(B)
Lock-X(B) Lock-X(A)
Read(B) Read(A)
if A=0 then B=B+1 if B=0 then A=A+1
Write(B) Write(A)
Unlock(A) Unlock(B)
Commit Commit
Unlock(B) Unlock(A)
This is lock-unlock instruction sequence help to satisfy the requirements for strict two
phase locking for the given transactions.
The execution of these transactions result in deadlock. Consider following partial
execution scenario which leads to deadlock.
T1 T2
Lock-S(A) Lock-S(B)
Read(A) Read(B)
Lock-X(B) Lock-X(A)
Now it will wait for T2 to Now it will wait for T1 to
release exclusive lock on A release exclusive lock on B
3.10.2 Lock Conversion
Lock conversion is a mechanism in two phase locking mechanism - which allows
conversion of shared lock to exclusive lock or exclusive lock to shared lock.
Method of Conversion :
First Phase :
o can acquire a lock-S on item
Second Phase :
o can release a lock-S
This protocol assures serializability. But still relies on the programmer to insert the
various locking instructions.
Lock-S(C)
R(C)
…
Upgrade(A)
W(A)
Unlock(A)
Unlock(B)
Unlock(C)
University Questions
1. What is concurrency control ? Explain two phase locking protocol with an example
AU : May-18, Marks 7
o Write Time stamp i.e. W-timestamp(A) - It is the youngested time stamp that
performed on write operation.
For example : Consider two transactions T1 and T2 with timestamps 10 sec and
20 sec. That means younger transaction has already performed write operation and now
older transaction is issuing the Read (A) request. This is as shown below
T T
1 2
(10 sec) older (20 sec) younger
transaction transaction
W(A)
R(A)
Then W-timestamp(A) for transaction T2= 20 sec. and TS(T1) for R(A) = 10. i.e.
For example : Consider two transactions T1 and T2 with timestamps 10 sec and
20 sec.
T1 T2
(10 sec) older (20 sec) younger
transaction transaction
R(A)
W(A)
For example : Consider two transactions T1 and T2 with timestamps 10 sec and
20 sec. That means younger transaction has already performed write operation and now
older transaction is issuing the write request. This is as shown below
T1 T2
(10 sec) older (20 sec) younger
transaction transaction
W(A)
W(A)
Here W-timestamp(A) of T2=20 sec and TS(T1) for write request =10 sec i.e.
For example just change the timestamps of T1 and T2. This can be as shown below -
T1 T2
(20 sec) (10 sec) older
younger transaction
transaction
R(A) or W(A)
W(A)
This is allowed and the write operation of T1 will be executed.
Now, the main problem arises. Now Transaction T1 is waiting for T2 to release its lock
and similarly, transaction T2 is waiting for T1 to release its lock. All activities come to a
halt state and remain at a standstill. This situation is called deadlock in DBMS.
Definition : Deadlock can be formally defined as - “ A system is in deadlock state if
there exists a set of transactions such that every transaction in the set is waiting for
another transaction in the set. “
There are four conditions for a deadlock to occur
A deadlock may occur if all the following conditions holds true.
1. Mutual exclusion condition : There must be at least one resource that cannot be
used by more than one process at a time.
2. Hold and wait condition : A process that is holding a resource can request for
additional resources that are being held by other processes in the system.
3. No preemption condition : A resource cannot be forcibly taken from a process.
Only the process can release a resource that is being held by it.
4. Circular wait condition : A condition where one process is waiting for a resource
that is being held by second process and second process is waiting for third process
….so on and the last process is waiting for the first process. Thus making a circular
chain of waiting.
Deadlock can be handled using two techniques –
1. Deadlock Prevention 2. Deadlock Detection and deadlock recovery
1. Deadlock Prevention :
For large database, deadlock prevention method is suitable. A deadlock can be
prevented if the resources are allocated in such a way that deadlock never occur. The
DBMS analyzes the operations whether they can create deadlock situation or not, If they
do, that transaction is never allowed to be executed.
There are two techniques used for deadlock prevention –
Database Management Systems 3 - 38 Transactions
(i) Wait-Die :
In this scheme, if a transaction requests for a resource which is already held with a
conflicting lock by another transaction then the DBMS simply checks the
timestamp of both transactions. It allows the older transaction to wait until the
resource is available for execution.
Suppose there are two transactions Ti and Tj and let TS(T) is a timestamp of any
transaction T. If T2 holds a lock by some other transaction and T1 is requesting for
resources held by T2 then the following actions are performed by DBMS :
o Check if TS(Ti) < TS(Tj) - If Ti is the older transaction and Tj has held some
resource, then Ti is allowed to wait until the data-item is available for
execution. That means if the older transaction is waiting for a resource which
is locked by the younger transaction, then the older transaction is allowed to
wait for resource until it is available.
o Check if TS(Ti) < TS(Tj) - If Ti is older transaction and has held some resource
and if Tj is waiting for it, then Tj is killed and restarted later with the random
delay but with the same timestamp.
Here TS(T1) i.e. Time stamp of T1 is less than TS(T3). In other words T1 is older than
T3. Hence T1 is made to wait while T3 is rolledback.
(ii) Wound- wait :
o In wound wait scheme, if the older transaction requests for a resource which
is held by the younger transaction, then older transaction forces younger one
to kill the transaction and release the resource. After some delay, the younger
transaction is restarted but with the same timestamp.
Database Management Systems 3 - 39 Transactions
o If the older transaction has held a resource which is requested by the Younger
transaction, then the younger transaction is asked to wait until older releases
it.
Suppose T1 needs a resource held by T2 and T3 also needs the resource held by T2,
with TS(T1)=5, TS(T2)=8 and TS(T3)=10, then T1 being older waits and T3 being younger
dies. After the some delay, the younger transaction is restarted but with the same
timestamp.
This ultimately prevents a deadlock to occur.
To summarize
Wait-Die Wound-wait
Older transaction needs a data older transaction waits younger transaction dies.
item held by younger
transaction
Younger transaction needs a Younger transaction dies Younger transaction dies.
data item held by older
transaction
2. Deadlock Detection :
In deadlock detection mechanism, an algorithm that examines the state of the
system is invoked periodically to determine whether a deadlock has occurred or
not. If deadlock is occurrence is detected, then the system must try to recover from
it.
Deadlock detection is done using wait for graph method.
Wait For Graph
o In this method, a graph is created based on the transaction and their lock. If
the created graph has a cycle or closed loop, then there is a deadlock.
o The wait for the graph is maintained by the system for every transaction
which is waiting for some data held by the others. The system keeps checking
the graph if there is any cycle in the graph.
o This graph consists of a pair G = (V, E), where V is a set of vertices and E is a
set of edges.
For example – Consider following transactions, We will draw a wait for graph
for this scenario and check for deadlock.
T1 T2
R(A)
R(A)
W(A)
R(B)
W(A)
W(B)
Step 2 : We find the Read-Write pair from two different transactions reading from top to
bottom. If such as pair is found then we will add the edges between corresponding
directions. For instance -
Database Management Systems 3 - 41 Transactions
Step 3 :
As cycle is detected in the wait-for graph there is no need to further process. The
deadlock is present in this transaction scenario.
Example 3.12.1 Give an example of a scenario where two phase locking leads to deadlock.
AU : May-09, Marks 4
Solution : Following scenario of execution of transactions can result in deadlock.
T1 T2
Lock – S (A)
Lock – S (B)
Read (B)
These two
instructions Read (A)
cause a deadlock Lock – X (B)
situation.
Lock – X (A)
In above scenario, transaction T1 makes an exclusive lock on data item B and then
transaction T2 makes an exclusive lock on data item A. Here unless and until T1 does not
give up the lock (i.e. unlock) on B; T2 cannot read / write it. Similarly unless and until T2
does not give up the lock on A; T1 cannot read or write on A.
University Questions
Logical Error :
i) This error is caused due to internal conditions such as bad input, data not
found, overflow of resource limit and so on.
ii) Due to logical error the transaction can not be continued.
System Error :
i) When the system enters in an undesired state and then the transaction can not
be continued then this type of error is called as system error.
2) System Crash : The situation in which there is a hardware malfunction, or a bug in
the database software or the operating system, and because of which there is a loss
of the content of volatile storage, and finally the transaction processing come to a
halt is called system crash.
3) Disk Failure : A disk block loses its content as a result of either a head crash or
failure during a data-transfer operation. The backup of data is maintained on the
secondary disks or DVD to recover from such failure.
3.13.2 Storage
A DBMS stores the data on external storage because the amount of data is very huge
and must persist across program executions.
The storage structure is a memory structure in the system. It has following categories -
1) Volatile :
Volatile memory is a primary memory in the system and is placed along with the
CPU.
These memories can store only small amount of data, but they are very fast. For
example - main memory, cache memory.
A volatile storage cannot survive system crashes.
That means data in these memories will be lost on failure.
Database Management Systems 3 - 43 Transactions
2) Non Volatile :
Non volatile memory is a secondary memory and is huge in size. For example : Hard
disk, Flash memory, magnetic tapes.
These memories are designed to withstand system crashes.
3) Stable :
The log contains various fields as shown in following Fig. 3.13.1. This structure is
for <update> operation
Transaction Data Item Old Value of New Value
ID(Ti) Name Data Item of Data Item
Log Database
<T1, Start>
<T1, A, 90>
<T1, B, 210>
<T1, Commit>
A = 90
B = 210
<T2, Start>
<T2, C, 280>
<T2, Commit>
C = 280
a) Just after write (B, b)
Just after write operation, no commit record appears in log. Hence no write
operation is performed on database. So database retains only old values. Hence
A = 100 and B = 200 respectively.
Thus the system comes back to original position and no redo operation take place.
Database Management Systems 3 - 47 Transactions
T1 T2
Read(A,a) Read(C,c)
a=a-10 c=c-20
Write(A,a) Write(C,c)
Read(B,b)
b=b+10
Write(B,b)
Here T1 and T2 are executed serially. Initially A=100, B=200 and C=300
Then using the immediate Database modification approach the result of above three
scenarios can be elaborated as follows :
The contents of log and database is as follows :
Log Database
<T1,Start>
<T1,A,100,90>
A=90
<T1,B,200,210>
B=210
<T1,Commit>
<T2,Start>
<T2,C,300,280>
C=280
<T2,Commit>
ii) REDO (Ti) : The transaction Ti needs to be redone if the log contains both <Ti,Start>
and <Ti,Commit>. In this phase, the data item values are set to the new values as
per the transaction. After a failure has occurred log record is consulted to determine
which transaction need to be redone.
Database Management Systems 3 - 49 Transactions
a) Just after Write (B,b) : When system comes back from this crash, it sees that there is
<T1, Start> but no <T1, Commit>. Hence T1 must be undone. That means old values
of A and B are restored. Thus old values of A and B are taken from log and both the
transaction T1 and T2 are re-executed.
b) Just after Write (C,c): Here both the redo and undo operations will occur
c) Undo : When system comes back from this crash, it sees that there is <T2, Start> but
no <T2, Commit>. Hence T2 must be undone. That means old values of C is
restored. Thus old value of C is taken from log and the transaction T2 is
re-executed.
d) Redo : The transaction T1 must be done as log contains both the <T1,Start> and
<T1,Commit>
So A=90, B=210 and C=300
e) Just after <T2,Commit> : When the system comes back from this crash, it sees that
there are two transaction T1 and T2 with both start and commit points. That means
T1 and T2 need to be redone. So A=90, B=210 and C=280
3.14 Save Points
The COMMIT, ROLLBACK, and SAVEPOINT are collectively considered as
Transaction Commands
(1) COMMIT : The COMMIT command is used to save permanently any transaction to
database.
When we perform, Read or Write operations to the database then those changes can
be undone by rollback operations. To make these changes permanent, we should
make use of commit
(2) ROLLBACK : The ROLLBACK command is used to undo transactions that have
not already saved to database. For example
Consider the database table as
RollNo Name
1 AAA
2 BBB
3 CCC
4 DDD
5 EEE
Following command will delete the record from the database, but if we immediately
performs ROLLBACK, then this deletion is undone.
For instance –
DELETE FROM Student
WHERE RollNo =2;
ROLLBACK;
Then the resultant table will be
RollNo Name
1 AAA
2 BBB
3 CCC
4 DDD
5 EEE
(3) SAVEPOINT : A SAVEPOINT is a point in a transaction when you can roll the
transaction back to a certain point without rolling back the entire transaction. The
SAVEPOINT can be created as
SAVEPOINT savepoint_name;
Then we can ROLLBACK to SAVEPOIT as
ROLLBACK TO savepoint_name;
For example – Consider Student table as follows –
RollNo Name
1 AAA
2 BBB
3 CCC
4 DDD
5 EEE
RollNo Name
1 AAA
2 BBB
3 CCC
Thus the effect of deleting the record having RollNo 2, and RollNo3 is undone.
3.15 Isolation Levels
The consistency of the database is maintained with the help of isolation
property(one of the property from ACID properties ) of transaction.
The transaction should take place in a system in such a way that it is the only
transaction that is accessing the resources in a database system at particular
instance.
Isolation levels defines the degree to which a transaction must be isolated from
the data modifications made by any other transaction in the database system.
There are four levels of transaction isolation defined by SQL -
o Serializable :
This is the Highest isolation level.
Serializable execution is defined to be an execution of operations in which
concurrently executing transactions appears to be serially executing.
Database Management Systems 3 - 52 Transactions
o Repeatable Read :
This is the most restrictive isolation level.
The transaction holds read locks on all rows it references.
It holds write locks on all rows it inserts, updates, or deletes.
Since other transaction cannot read, update or delete these rows, it avoids non
repeatable read.
o Read Committed :
This isolation level allows only committed data to be read.
Thus it does not allows dirty read (i.e. one transaction reading of data
immediately after written by another transaction).
The transaction hold a read or write lock on the current row, and thus prevent
other rows from reading, updating or deleting it.
o Read Uncommitted :
It is lowest isolation level.
In this level, one transaction may read not yet committed changes made by other
transaction.
This level allows dirty reads.
In this level transactions are not isolated from each other.
3.16 SQL Facilities for Concurrency and Recovery AU : May-14, Marks 8
Step 3 : In select query it will take only commited values of table. If any transaction is
opened and incompleted on table in others sessions then select query will wait till no
transactions are pending on same table. Read Committed is the default transaction
isolation level.
Session 1:
BEGIN TRAN
UPDATE emp SET Salary=999 WHERE ID=1
Database Management Systems 3 - 54 Transactions
University Question
Ans. : A transaction can be defined as a group of tasks that form a single logical unit.
Q.2 What does time to commit mean ? AU : May-04
Ans. :
The COMMIT command is used to save permanently any transaction to database.
Database Management Systems 3 - 55 Transactions
When we perform, Read or Write operations to the database then those changes
can be undone by rollback operations. To make these changes permanent, we
should make use of commit
Q.3 What are the various properties of transaction that the database system maintains to
ensure integrity of data. AU : Dec.-04
OR
Q.4 What are ACID properties ? AU : May-05,06,08,13,15,Dec.-07,14,17
Ans. : In a database, each transaction should maintain ACID property to meet the
consistency and integrity of the database. These are
(1) Atomicity (2) Consistency (3) Isolation (4) Durability
Q.5 Give the meaning of the expression ACID transaction. AU : Dec.-08
Ans. : The expression ACID transaction represents the transaction that follows the ACID
Properties.
Q.6 State the atomicity property of a transaction. AU : May-09,13
Ans. : This property states that each transaction must be considered as a single unit and
must be completed fully or not completed at all.
No transaction in the database is left half completed.
Q.7 What is meant by concurrency control ? AU : Dec.-15
Ans. : A mechanism which ensures that simultaneous execution of more than one
transactions does not lead to any database inconsistencies is called concurrency control
mechanism.
Q.8 State the need for concurrency control. AU : Dec.-17
OR
Q.9 Why is it necessary to have control of concurrent execution of transactions ? How is it
made possible ? AU : Dec.-02
Ans. : Serializability is a concept that helps to identify which non serial schedule and find
the transaction equivalent to serial schedule.
It is tested using precedence graph technique.
Q.12 What is serializable schedule ? AU : May-17
Ans. : The schedule in which the transactions execute one after the other is called serial
schedule. It is consistent in nature. For example : Consider following two transactions T1
and T2
T1 T2
R(A)
W(A)
R(B)
W(B)
R(A)
W(A)
R(B)
W(B)
All the operations of transaction T1 on data items A and then B executes and then in
transaction T2 all the operations on data items A and B execute. The R stands for Read
operation and W stands for write operation.
Q.13 When are two schedules conflict equivalent ? AU : Dec.-08
T1 T2 T1 T2
Read(A) Read(A)
Write(A) Write(A)
Read(A) Read(B)
Write(A) Write(B)
Read(B) Read(A)
Database Management Systems 3 - 57 Transactions
Write(B) Write(A)
Read(B) Read(B)
Write(B) Write(B)
Ans. : The two phase locking is a protocol in which there are two phases :
i) Growing Phase (Locking Phase) : It is a phase in which the transaction may obtain
locks but does not release any lock.
ii) Shrinking Phase (Unlocking Phase) : It is a phase in which the transaction may
release the locks but does not obtain any new lock.
Q.15 What is the difference between shared lock and exclusive lock? AU : May-18
Ans:
Shared lock is used for when the transaction Exclusive lock is used when the transaction
wants to perform read operation. wants to perform both read and write operation.
Multiple shared lock can be set on a transactions Only one exclusive lock can be placed on a data
simultaneously. item at a time.
Using shared lock data item can be viewed. Using exclusive lock data can be inserted or
deleted.
Q.16 What type of lock is needed for insert and delete operations. AU : May-17
Ans. : Benefits :
1. This ensure that any data written by an uncommitted transaction are locked in
exclusive mode until the transaction commits and preventing other transaction from
reading that data .
2. This protocol solves dirty read problem.
Database Management Systems 3 - 58 Transactions
Disadvantage:
1. Concurrency is reduced.
Q.18 What is rigorous two phase locking protocol ? AU : Dec.-13
Ans. : This is stricter two phase locking protocol. Here all locks are to be held until the
transaction commits.
Q.19 Differentiate strict two phase locking and rigourous two phase locking protocol.
AU : May-16
Ans. :
In Strict two phase locking protocol all the exclusive mode locks be held until the
transaction commits.
The rigourous two phase locking protocol is stricter than strict two phase locking
protocol. Here all locks are to be held until the transaction commits.
Q.20 Define deadlock. AU : May-08,09,14
Ans. : Deadlock is a situation in which when two or more transactions have got a lock
and waiting for another locks currently held by one of the other transactions.
Q.21 List four conditions for deadlock. AU : Dec.-16
Ans. : 1. Mutual exclusion condition 2. Hold and wait condition 3.No preemption
condition 4.Circular wait condition
Q.22 Why is recovery needed ? AU : May-09
Ans. :
A recovery scheme that can restore the database to the consistent state that
existed before the failure.
Due to recovery mechanism, there is high availability of database to its users.
UNIT - IV
Syllabus
RAID - File Organization - Organization of Records in Files - Indexing and Hashing - Ordered
Indices - B+ tree Index Files - B tree Index Files - Static Hashing - Dynamic Hashing - Query
Processing Overview - Algorithms for SELECT and JOIN operations - Query optimization using
Heuristics and Cost Estimation.
Contents
4.1 RAID......................................................................... May-07,15,16,17,
................................................................................. Dec.-06,13,14,15,16,........... Marks 16
4.2 File Organization ...................................................... Dec.-08, .............................. Marks 10
4.3 Organization of Records in File
4.4 Indexing and Hashing .............................................. Dec.-11,................................. Marks 6
4.5 Ordered Indices........................................................ Dec.-04, 08,15, May-06, .... Marks 10
4.6 B+ Tree Index Files .................................................. May-03, 06,16, Dec.-17, ... Marks 16
4.7 B Tree Index Files .................................................... May-08, Dec.-12,14, ........... Marks 16
4.8 Hashing .................................................................... Dec.- 04, 05, May-05, 14, ..... Marks 8
4.9 Static Hashing .......................................................... May-06, ................................. Marks 8
4.10 Dynamic Hashing ..................................................... May-04,07,18, Dec.-08,17 .. Marks 13
4.11 Query Processing Overview..................................... May-14, 16, 18, .................. Marks 16
4.12 Measure of Query Cost
4.13 Algorithms for SELECT Operation
4.14 Algorithms for JOIN Operation
4.15 Query Optimization using Heuristics
and Cost Estimation ................................................. Dec.-13, 16, May-15, ......... Marks 16
4.16 Two Marks Questions with Answers
(4 - 1)
Database Management Systems 4-2 Implementation Techniques
o There is no duplication of data in this level so once a block is lost then there is no
way recover it.
o The main priority of this level is performance and not the reliability.
Level : RAID 1
o This level makes use of mirroring. That means all data in the drive is duplicated to
another drive.
o This level provides 100% redundancy in case of failure.
o Only half space of the drive is used to store the data. The other half of drive is just
a mirror to the already stored data.
o The main advantage of this level is fault tolerance. If some disk fails then the other
automatically takes care of lost data.
Database Management Systems 4-4 Implementation Techniques
Level : RAID 2
o This level makes use of mirroring as well as stores Error Correcting Codes (ECC)
for its data striped on different disks.
o The data is stored in separate set of disks and ECC is stored another set of disks.
o This level has a complex structure and high cost. Hence it is not used
commercially.
Level : RAID 3
o This level consists of byte-level stripping with dedicated parity. In this level, the
parity information is stored for each disk section and written to a dedicated parity
drive.
o We can detect single errors with a parity bit. Parity is a technique that checks
whether data has been lost or written over when it is moved from one place in
storage to another.
o In case of disk failure, the parity disk is accessed and data is reconstructed from
the remaining devices. Once the failed disk is replaced, the missing data can be
restored on the new disk.
Database Management Systems 4-5 Implementation Techniques
Level : RAID 4
o RAID 4 consists of block-level stripping with a parity disk.
o Note that level 3 uses byte-level striping, whereas level 4 uses block-level striping.
Database Management Systems 4-6 Implementation Techniques
Level : RAID 5
o RAID 5 is a modification of RAID 4.
o RAID 5 writes whole data blocks onto different disks, but the parity bits generated
for data block stripe are distributed among all the data disks rather than storing
them on a different dedicated disk.
RAID : Level 6
o RAID 6 is a extension of Level 5
o RAID 6 writes whole data blocks onto different disks, but the two independent
parity bits generated for data block stripe are distributed among all the data
disks rather than storing them on a different dedicated disk.
o Two parities provide additional fault tolerance.
o This level requires at least four disks to implement RAID.
Database Management Systems 4-7 Implementation Techniques
University Questions
1. Explain what a RAID system is ? How does it improve performance and reliability. Discuss the level 3
and level 4 of RAID. AU : May-17, Marks (3+4+6)
A file organization is a method of arranging records in a file when the file is stored on
disk.
A file is organized logically as a sequence of records.
Record is a sequence of fields.
There are two types of records used in file organization (1) Fixed Length Record
(2) Variable Length Record.
Database Management Systems 4-8 Implementation Techniques
1 1 1 1 A A A 1
0 0 0 1 1 1 1 1 1 1 1 1 1
Thus total 29 bytes are required to store.
Advantage :
Access is fast because the computer knows where each record starts.
Disadvantage :
(1) Due to fixed size, some larger sized record may cross the block boundaries. That
means part of record will be stored in one block and other part of the record may be
stored in some another block. Thus we may require two block access for each read or
write.
(2) It is difficult to delete the record from this structure. If some intermediate record is
deleted from the file then the vacant space must be occupied by next subsequent
records.
When a record is deleted,we could move the record that came after it into the space
formerly occupied by the deleted record. But this may require moving of multiple
records to occupy the vacant space. This is an undesirable solution to fill up the
vacant space of deleted records.
Database Management Systems 4-9 Implementation Techniques
Another approach is to use a file header. At the beginning of the file, we allocate a
certain number of bytes as a file header. The header will contain information such as
address of the first record whose contents are deleted. We use this first record to store
the address of the second available record and so on. Thus the stored record addresses
are referred as pointer while the deleted records thus form a linked list which is called
as free list.
For example - Consider the employee record in a file is -
On insertion of a new record, we use the record pointed to by the header. We change
the header pointer to point to the next available record. If no space is available, we
add the new record to the end of the file.
Database Management Systems 4 - 10 Implementation Techniques
The figure also illustrates the use of a null bitmap, which indicates which attributes of
the record have a null value.
The variable length record can be stored in blocks. A specialized structure called
slotted page structure is commonly used for organizing the records within a block.
This structure is as shown by following Fig. 4.2.1.
Database Management Systems 4 - 11 Implementation Techniques
Fig. 4.2.1
University Question
1. Describe different types of file organization. Explain using a sketch of each of the, with their advantages
and disadvantages. AU : Dec.-08, Marks 10
(1) Heap File Organization : Any record can be placed anywhere in the file where
there is a space. There is no ordering for placing the records in the file. Generally
single file is used.
(2) Sequential File Organization : Records are stored in sequential order based on
the value of search key.
Database Management Systems 4 - 12 Implementation Techniques
(3) Hashing File Organization : A hash function is used to obtain the location of the
record in the file. Based on the value returned by hash function, the record is
stored in the file.
4.3.1 Sequential File Organization
The sequential file organization is a simple file organization method in which the
records are stored based on the search key value.
For example – Consider following set of records stored according to the RollNo of
student. Note that we assume here that the RollNo is a search key.
Fig. 4.3.1
However, maintaining physical sequential order is very difficult as there can be
several insertions and deletions.
We can maintain the deletion by next pointer chain.
For insertion following rules can be applied -
o If there is free space insert record there.
Database Management Systems 4 - 13 Implementation Techniques
Ankita 55
Ankita 67
Ankita 86
Prajkta 91
This type of file organization is good for join operations such as Student Course.
This file organization results in variable size records.
The pointer chain can be added to the above records to keep track of address of next
record. It can be as shown in following Fig. 4.3.2
Database Management Systems 4 - 14 Implementation Techniques
Fig. 4.3.2
An index is a data structure that organizes data records on the disk to make the
retrieval of data efficient.
The search key for an index is collection of one or more fields of records using which
we can efficiently retrieve the data that satisfy the search conditions.
The indexes are required to speed up the search operations on file of records.
There are two types of indices -
o Ordered Indices : This type of indexing is based on sorted ordering values.
o Hash Indices : This type of indexing is based on uniform distribution of values
across range of buckets. The address of bucket is obtained using the hash function.
There are several techniques of for using indexing and hashing. These techniques are
evaluated based on following factors -
o Access Types : It supports various types of access that are supported efficiently.
o Access Time : It denotes the time it takes to find a particular data item or set items.
o Insertion Time : It represents the time required to insert new data item.
o Deletion Time : It represents the time required to delete the desired data item.
o Space overhead : The space is required to occupy the index structure. But
allocating such extra space is worth to achieve improved performance.
Example 4.4.1 Since indices speed query processing. Why might they not be kept on several
search keys? List as many reasons as possible. AU : Dec.-11, Marks 6
d. For queries which involve conditions on several search keys, efficiency might not be
bad even if only some of the keys have indices on them.
Therefore database performance is improved less by adding indices when many
indices already exist.
Once if you are able to locate the first entry of the record containing block, other
entries are stored continuously. For example if we want to search a record for RegNo
11AS32 we need not have to search for the entire data file. With the help of primary
index structure we come to know the location of the record containing the RegNo
11AS30, now when the first entry of block 30 is located, then we can easily locate the
entry for 11AS32.
We can apply binary search technique. Suppose there are n = 300 blocks in a main data
file then the number of accesses required to search the data file will be log2 n + 1 = (log2
300) + 1 9
If we use primary index file which contains at the most n = 3 blocks then using binary
search technique, the number of accesses required to search using the primary index
file will be log2 n + 1 = (log2 3 ) + 1 3
Database Management Systems 4 - 16 Implementation Techniques
This shows that using primary index the access time can be deduced to great extent.
Clustered Index :
In some cases, the index is created on non-primary key columns which may not be
unique for each record. In such cases, in order to identify the records faster, we will
group two or more columns together to get the unique values and create index out of
them. This method is known as clustering index.
When a file is organized so that the ordering of data records is the same as the
ordering of data entries in some index then say that index is clustered, otherwise it is
an unclustered index.
Note that, the data file need to be in sorted order.
Basically, records with similar characteristics are grouped together and indexes are
created for these groups.
For example, students studying in each semester are grouped together. i.e.; 1st
semester students, 2nd semester students, 3rd semester students etc. are grouped.
For example :
2) Sparse index :
Index records are created only for some of the records.
To locate a record, we find the index record with the largest search key value less
than or equal to the search key value we are looking for.
We start at that record pointed to by the index record, and proceed along the
pointers in the file (that is, sequentially) until we find the desired record.
For example -
The index file usually occupies considerably less disk blocks than the data file because
its entries are much smaller.
A binary search on the index yields a pointer to the file record.
The types of single level indexing can be primary indexing, clustering index or
secondary indexing.
Example : Following Fig. 4.5.5 represents the single level indexing -
Multilevel indexing :
There is an immense need to keep the index records in the main memory so as to
speed up the search operations. If single-level index is used, then a large size index
cannot be kept in memory which leads to multiple disk accesses.
Multi-level Index helps in breaking down the index into several smaller indices in
order to make the outermost level so small that it can be saved in a single disk block,
which can easily be accommodated anywhere in the main memory.
The multilevel indexing can be represented by following Fig. 4.5.6
For example -
University Questions
1 Illustrate indexing and hashing techniques with suitable examples. AU : Dec.-15, Marks 8
The B+ tree is similar to binary search tree. It is a balanced tree in which the internal
nodes direct the search.
The leaf nodes of B+ trees contain the data entries.
Structure of B+ Tree
It contains up to n − 1 search-key values K1, K2, . . . , Kn−1, and n pointers P1, P2, . . . , Pn.
The search-key values within a node are kept in sorted order; thus, if i < j, then Ki < Kj .
Database Management Systems 4 - 21 Implementation Techniques
To retrieve all the leaf pages efficiently we have to link them using page pointers. The
sequence of leaf pages is also called as sequence set.
Following Fig. 4.6.1 represents the example of B+ tree.
The B+ tree is called dynamic tree because the tree structure can grow on insertion of
records and shrink on deletion of records.
Characteristics of B+ Tree
Following are the characteristics of B+ tree.
1) The B+ tree is a balanced tree and the operations insertions and deletion keeps the
tree balanced.
2) A minimum occupancy of 50 percent is guaranteed for each node except the root.
3) Searching for a record requires just traversal from the root to appropriate leaf.
4.6.1 Insertion Operation
Algorithm for insertion :
Step 1 : Find correct leaf L.
Step 2 : Put data entry onto L.
i) To split index node, redistribute entries evenly, but push up middle key. (Contrast
with leaf splits.)
Step 4 : Splits “grow” tree; root split increases height.
Step 2 : Now if we insert 22, the sequence will be 22 , 23, 30, 31, 32. The middle key 30,
will go up.
Step 4 : Insert 29. The sequence becomes 22, 23, 24, 28, 29. The middle key 24 will go up.
Thus we get the B+ tree.
Database Management Systems 4 - 23 Implementation Techniques
Example 4.6.2 Construct B+ tree to insert the following (order of the tree is 3)
26,27,28,3,4,7,9,46,48,51,2,6
Solution : Order means maximum number of children allowed by each node. Hence
order 3 means at the most 2 key values are allowed in each node.
Step 1 : Insert 26, 27 in ascending order
Step 2 : Now insert 28. The sequence becomes 26,27,28. As the capacity of the node is full,
27 will go up. The B+ tree will be,
Step 4 : Insert 4. The sequence becomes 3,4, 26. The 4 will go up. The partial B+ tree will
be -
Step 5 : Insert 7. The sequence becomes 4,7,26. The 7 will go up. Again from 4,7,27 the 7
will go up. The partial B+ Tree will be,
Database Management Systems 4 - 24 Implementation Techniques
Step 6 : Insert 9. By inserting 7,9, 26 will be the sequence. The 9 will go up. The partial B+
tree will be,
Step 7 : Insert 46. The sequence becomes 27,28,46. The 28 will go up. Now the sequence
becomes 9, 27, 28. The 27 will go up and join 7. The B+ Tree will be,
Step 8 : Insert 48. The sequence becomes 28,46,48. The 46 will go up. The B+ Tree will
become,
Step 9 : Insert 51. The sequence becomes 46,48,51. The 48 will go up. Then the sequence
becomes 28, 46, 48. Again the 46 will go up. Now the sequence becomes 7,27, 46. Now the
27 will go up. Thus the B+ tree will be
Database Management Systems 4 - 25 Implementation Techniques
Step 11 : Insert 6. The insertion can be made in a vacant node of 7(the leaf node). The final
B+ tree will be,
Database Management Systems 4 - 26 Implementation Techniques
Step 2 : If we insert 7, the sequence becomes 2, 3, 5, 7. Since each node can accommodate
at the most three key, the 5 will go up, from the sequence 2, 3, 5 , 7.
Database Management Systems 4 - 27 Implementation Techniques
Step 4 : Insert 17. The sequence becomes 5,7, 11,17. The element 11 will go up. Then the
partial B+ tree becomes,
Step 6 : Insert 23. The sequence becomes 11,17,19,23. The 19 will go up.
Database Management Systems 4 - 28 Implementation Techniques
Step 8 : Insert 31. The sequence becomes 19,23,29, 31. The 29 will go up. Then at the upper
level the sequence becomes 5,11,19,29. Hence again 19 will go up to maintain the capacity
of node (it is four pointers= three key values at the most). Hence the complete B+ tree will
be,
(a) Insertion of 9 : It is very simple operation as the node containing 5,7 has one space
vacant to accommodate. The B+ tree will be,
Database Management Systems 4 - 29 Implementation Techniques
(b) Insert 10 : If we try to insert 10 then the sequence becomes 5,7,9,10. The 9 will go up.
The B+ tree will then become –
(c) Insert 8 : Again insertion of 8 is simple. We have a vacant space at node 5,7. So we just
insert the value over there. The B+ tree will be-
(d) Delete 23 : Just remove the key entry of 23 from the node 19,23. Then merge the
sibling node to form a node 19,29,31. Get down the entry of 11 to the leaf node.
Attach the node of 11,17 as a left child of 19.
Database Management Systems 4 - 30 Implementation Techniques
(e) Delete 19 : Just delete the entry of 19 from the node 19,29,31. Delete the internal node
key 19. Copy the 29 up as an internal node as it is an inorder successor node.
1. In B+ tree the data is stored in leaf node so searching of any data requires scanning
only of leaf node alone.
2. Data is ordered in linked list.
3. Any record can be fetched in equal number of disk accesses.
4. Range queries can be performed easily as leaves are linked up.
5. Height of the tree is less as only keys are used for indexing.
6. Supports both random and sequential access.
Demerits of B+ Index Tree Structure
University Questions
1 Explain the B+ tree indexes on multiple keys with suitable example. AU : Dec.-17, Marks 7
3. What are the merits and demerits of B+ tree index structures. AU : May-03, Marks 4
Step 2 : If we insert the value 30. The sequence becomes 10,20,30. As only two key values
are allowed in each node (being order 3), the 20 will go up.
Step 4 : Insert 12. The sequence becomes 10, 12, 15. The middle element 12 will go up.
Step 5 : Insert 40
Step 6 : Insert 50. The sequence becomes 30,40,50. The 40 will go up. But again it forms
12,20,40 sequence and then 20 will go up. Thus the final B Tree will be,
B Trees B+ Trees
In a B-tree you can store both keys and data in In a B+ tree you have to store the data in the leaf
the internal and leaf nodes. nodes only.
It wastes space. It does not waste space.
The leaf node cannot store using linked list. The leaf nodes are connected using linked list.
Searching becomes difficult in B-tree as data Searching of any data in a B+ tree is very easy
cannot be found in the leaf node. because all data is found in leaf nodes.
The B-tree does not store redundant search key. The B-tree stores redundant search key.
University Questions
1. Explain detail about i) B+ tree index ii) B tree index files
AU : Dec.-14, Marks 16, May-08, Marks 10
2. Write down detailed notes on ordered indices and B-tree index files AU : Dec.-12, Marks 16
Database Management Systems 4 - 33 Implementation Techniques
Hash file organization method is the one where data is stored at the data blocks
whose address is generated by using hash function.
The memory location where these records are stored is called as data block or bucket.
This bucket is capable of storing one or more records.
The hash function can use any of the column value to generate the address. Most of
the time, hash function uses primary key to generate the hash index - address of the
data block.
Hash function can be simple mathematical function to any complex mathematical
function.
For example - Following figure represents the records of student can be searched
using hash based indexing. In this example the hash function is based on the age field
of the record. Here the index is made up of data entry k* which is actual data record.
For a hash function the age is converted to binary number format and the last two
digits are considered to locate the student record.
For example : Consider hash function as key mod 5. The hash table of size 5.
3) Bucket : The hash function H(key) is used to map several dictionary entries in the
hash table. Each position of the hash table is called bucket.
4) Collision : Collision is situation in which hash function returns the same address
for more than one record.
For example :
5) Probe : Each calculation of an address and test for success is known as a probe.
6) Synonym : The set of keys that has to the same location are called synonyms. For
example - In above given hash table computation 25 and 55 are synonyms.
7) Overflow : When hash table becomes full and new record needs to be inserted
then it is called overflow.
For example -
University Questions
In this method of hashing, the resultant data bucket address will be always same.
Index Stud_RollNo
0 34780
1 34781
2 34782
3 34783
4 34784
5 34785
6 34786
7 34787
8 34788
9 34789
If there is no space for some data entry then we can allocate new overflow page, put
the data record onto that page and add the page to overflow chain of the bucket. For
example if we want to add the Stud_RollNo= 35111 in above hash table then as there
is no space for this entry and the hash address indicate to place this record at index 1,
we create overflow chain as shown in Table 4.9.1.
Database Management Systems 4 - 36 Implementation Techniques
Example 4.9.1 Why is hash structure not the best choice for a search key on which range of
queries are likely ? AU : May-06, Marks 8
Solution :
A range query cannot be answered efficiently using a hash index, we will have to read
all the buckets.
This is because key values in the range do not occupy consecutive locations in the
buckets, they are distributed uniformly and randomly throughout all the buckets.
Advantages of Static Hashing
(1) It is simple to implement.
(2) It allows speedy data storage.
Disadvantages of Static Hashing
There are two major disadvantages of static hashing :
1) In static hashing, there are fixed number of buckets. This will create a problematic
situation if the number of records grow or shrink.
2) The ordered access on hash key makes it inefficient.
4.9.1 Open Hashing
The open hashing is a form of static hashing technique. When the collision occurs, that
means if the hash key returns the same address which is already allocated by some data
record, then the next available data block is used to enter new record instead of
overwriting the old record. This technique is also called as linear probing. For example
Consider insertion of record 105 in the hash table below with the hash function h (key)
mod 10.
Index Stud_RollNo
0 10
1 1
2 22
3
4
5 55
6 106
7
8 88
9 19
Database Management Systems 4 - 37 Implementation Techniques
Index Stud_RollNo
0 10
1 1
2 22
3
4
5 55
6 106
7 105
8 88
9 19
Advantages :
1) It is faster technique.
2) It is simple to implement.
Disadvantages:
1) It forms clustering, as the record is just inserted to next free available slot.
2) If the hash table gets full then the next subsequent records can not be
accommodated.
The problem with static hashing is that it does not expand or shrink dynamically as
the size of the database grows or shrinks.
Dynamic hashing provides a mechanism in which data buckets are added and
removed dynamically and on-demand.
The split image of bucket A i.e. A2 and old bucket A are based on last two bits i.e. 00.
Here we need two data pages, to adjacent additional data record. Therefore here it is
necessary to double the directory using three bits instead of two bits. Hence,
There will be binary versions for buckets A and A2 as 000 and 100.
In extendible hashing, last bits d is called global depth for directory and d is called
local depth for data pages or buckets. After insetion of 20*, the global depth becomes
3 as we consider last three bits and local depth of A and A2 buckets become 3 as we
are considering last three bits for placing the data records. Refer Fig. 4.10.3.
(Note : Student should refer binary values given in Fig. 4.10.2, for understanding insertion
operation)
Database Management Systems 4 - 40 Implementation Techniques
Suppose if we want to insert 11*, it belongs to bucket B, which is already full. Hence
let us split bucket B into old bucket B and split image of B as B2.
The local depth of B and B2 now becomes 3.
Now for bucket B, we get and 1 = 001
11 = 10001
For bucket B2, we get
5 = 101
29 = 11101
and 21 =10101
8 mod 8 = 0 = 000
9 mod 8 = 1 = 001
12 mod 8 = 4 = 100
17 mod 8 = 1 = 001
28 mod 8 = 4 = 100
The extendible hash table will be,
Hence we will extend the table by assuming the bit size as 3. The above indicated bucket
A will split based on 001 and 101.
Database Management Systems 4 - 43 Implementation Techniques
a) Insert 2 will be 2 mod 8 = 2 = 010. If we consider last two digits i.e. 10 then there is no
bucket. So we get,
Database Management Systems 4 - 44 Implementation Techniques
b) Insert 24 : 24 mod 8 = 0 = 000. The bucket in which 24 can be inserted is 8, 12, 28. But
as this bucket is full we split it in two buckets based on digits 000 100.
Database Management Systems 4 - 45 Implementation Techniques
c) Delete 5 : On deleting 5, we get one bucket pointed by 101 as empty. This will also
result in reducing the local depth of the bucket pointed by 001. Hence we get,
Database Management Systems 4 - 46 Implementation Techniques
(d) Delete 12 : We will simply delete 12 from the corresponding bucket there can not be
any merging of buckets on deletion. The result of deletion is as given below -
Database Management Systems 4 - 47 Implementation Techniques
1. The number of buckets are fixed. The number of buckets are not fixed.
3. Open hashing and Closed hashing are Extendible hashing and linear hashing
forms of static hashing. are forms of dynamic hashing.
4. Space overhead is more. Minimum space overhead due to
dynamic nature.
5. As file grows the performance of static There is no degradation in performance
hash function decreases. when the file grows.
6. The bucket address table is not required. The bucket address table is required.
7. The bucket is directly accessed. The bucket address table is used to access
the bucket.
Database Management Systems 4 - 48 Implementation Techniques
University Questions
1. What is hashing? Explain static hashing and dynamic hashing with an example.
AU : May-18, Marks 13
2. Explain the distinction between static and dynamic hashing. Discuss the relative merits of each technique
in database applications. AU : Dec.-17, Marks 13
3. Describe briefly static and dynamic hashing. AU : May-04, Marks 8, Dec.-08, Marks 6
4. Explain various hashing techniques. AU : May-07, Marks 8
Query processing is a collection of activities that are involved in extracting data from
database.
During query processing there is translation high level database language queries into
the expressions that can be used at the physical level of filesystem.
There are three basic steps involved in query processing and those are –
1. Parsing and Translation
o In this step the query is translated into its internal form and then into relational
algebra.
o Parser checks syntax and verifies relations.
o For instance - If we submit the query as,
SELECT RollNo, name
FROM Student
HAVING RollNo=10
Then it will issue a syntactical error message as the correct query should be
SELECT RollNo, name
FROM Student
HAVING RollNo=10
Thus during this step the syntax of the query is checked so that only correct and verified
query can be submitted for further processing.
2. Optimization :
o During this process the query evaluation plan is prepared from all the relational
algebraic expressions.
o The query cost for all the evaluation plans is calculated.
o Amongst all equivalent evaluation plans the one with lowest cost is chosen.
Database Management Systems 4 - 49 Implementation Techniques
o Cost is estimated using statistical information from the database catalog, such as
the number of tuples in each relation, size of tuples, etc.
3. Evaluation
o The query-execution engine takes a query-evaluation plan,executes that plan, and
returns the answers to the query.
The above describe steps are represented by following Fig. 4.11.1 –
Step 2 :
Query Evaluation Plan : To specify fully how to evaluate a query, we need not only to
provide the relational-algebra expression, but also to annotate it with instructions
specifying how to evaluate each operation. For that purpose, using the order of evaluation
of queries, two query evaluation plans are prepared. These are as follows
Database Management Systems 4 - 50 Implementation Techniques
University Questions
2. What is query optimization? Outline the steps in query optimization. AU : May-18, Marks 13
Cost to write a block is greater than cost to read a block because data is read back after
being written to ensure that the write was successful.
We use number of block transfers from disk and number of disk seeks to estimate
the Query cost.
Let,
o b be the number to blocks
o S be the number of Seeks
o tT is average time required to transfer a block of data, in seconds
o tS is average block access time in seconds.
Then query cost can be computed using following formula b*tT+S*tS
Scan each file block and test all records to see whether they satisfy the selection
condition
Cost = br block transfers + 1 seek
Where,
br denotes number of blocks containing records from relation r
Where,
br denotes number of blocks containing records from relation r
tT is average time required to transfer a block of data, in seconds
tS is average block access time, in seconds
If there are multiple records satisfying the selection add transfer cost of the number of
blocks containing records that satisfy selection condition.
4.14 Algorithms for JOIN Operation
JOIN operation is the most time consuming operation to process.
There are Several different algorithms to implement joins
1. Nested-loop join
2. Block nested-loop join
3. Indexed nested-loop join
4. Merge-join
5. Hash-join
Choice of a particular algorithm is based on cost estimate.
Algorithm For Nested Loop Join
This algorithm is for computing r s
Let, r is called the outer relation and s the inner relation of the join.
for each tuple tr in r do begin
for each tuple ts in s do begin
test pair (tr,ts) to see if they satisfy the join condition
if is satisfied, then, add (tr , ts) to the result.
end
end
This algorithm requires no indices and can be used with any kind of join condition.
It is expensive since it examines every pair of tuples in the two relations.
In the worst case, if there is enough memory only to hold one block of each relation,
the estimated cost is
nr × bs + br block transfers, plus nr + br seeks
If the smaller relation fits entirely in memory, use that as the inner relation
o Reduces cost to br + bs block transfers and 2 seeks
For example - Assume the query CUSTOMERS ORDERS (with join attribute only
being CName)
Number of records of customer : 10000 order : 5000
Database Management Systems 4 - 53 Implementation Techniques
In this operation, both the relations are sorted on their join attributes. Then merged
these sorted relations to join them.
Only equijoin and natural joins are used.
The cost of merge join is,
br + bs block transfers + ⌈br /bb⌉ + ⌈bs/bb⌉ seeks + the cost of sorting, if relations are
unsorted.
(4) Hash Join
In this operation, the hash function h is used to partition tuples of both the relations.
h maps A values to {0, 1, ..., n}, where A denotes the attributes of r and s used in the
join.
Cost of hash join is :
3(br + bs ) + 4 × n block transfers + 2(⌈br /bb⌉ + ⌈bs/bb⌉) seeks
If the entire build input can be kept in main memory no partitioning is required
Cost estimate goes down to br + bs.
Out of the above given query evaluation plans, the Fig. 4.15.1 (b) is much faster than
Fig. 4.15.1 (a) because – in Fig. 4.15.1 (a) the join operation is among Branch, Account and
Customer, whereas in Fig. 4.15.1 (b) the join of (Account and Customer) is made with the
selected tuple for City=”Pune” . Thus the output of entire table for join operation is much
more than the join for some selected tuples. Thus we get choose the optimized query.
University Questions
2. Discuss about the Join order optimization and Heuristic optimization algorithms.
AU : May-15, Marks 16
3. Give a detailed description about query processing and optimization. Explain the cost estimation of query
optimization. AU : Dec.-13, Marks 16
Software RAID : Software RAID implements the various RAID levels in the kernel
disk code. It offers the cheapest possible solution, as expensive disk controller
cards.
Q.5 Give the comparison between ordered indices and hashing AU : Dec.-06
Database Management Systems 4 - 57 Implementation Techniques
Ans. :
(1) If range of queries are common, ordered indices are to be used.
(2) The buckets containing records can be chained in sorted order in case of ordered
indices.
(3) Hashing is generally better at retrieving records having a specified value of the
key.
(4) Hash function assigns values randomly to buckets. Thus, there is no simple notion
of “next bucket in sorted order.”
Q.6 What are the causes of bucket overflow in a hash file organization ?
Ans. : Bucket overflow can occur for following reasons -
(1) Insufficient buckets : For the total number of buckets there are insufficient
number of buckets to occupy.
(2) Skew : Some buckets are assigned more records than are others, so a bucket might
overflow even while other buckets still have space. This situation is known as
bucket skew.
Q.7 What can be done to reduce the occurrences of bucket overflows in a hash file
organization ? AU : May-07, June-09, Dce.-12
Ans. :
(1) A bucket is a unit of storage containing one or more records (a bucket is typically a
disk block).
(2) The file blocks are divided into M equal-sized buckets, numbered bucket0,
bucket1... bucketM-1. Typically, a bucket corresponds to one (or a fixed number
of) disk block.
(3) In a hash file organization we obtain the bucket of a record directly from its
search-key value using a hash function, h (K).
(4) To reduce overflow records, a hash file is typically kept 70-80% full.
(5) The hash function h should distribute the records uniformly among the buckets;
otherwise, search time will be increased because many overflow records will exist.
Q.8 Distinguish between dense and sparse indices. AU : May-08, June-09
Ans. :
1. It is preferable to use a dense index instead of a sparse index when the file is not
sorted on the indexed field.
2. Or when the index file is small compared to the size of memory.
Database Management Systems 4 - 58 Implementation Techniques
Q.10 How does B-tree differs from a B+ tree? Why is a B+ tree usually preferred as an
access structure to a data file? AU : Dec.-08
Ans. :
(1) Searching of a key value becomes difficult in B-tree as data can not be found in the
leaf node.
(2) The leaf node can not store linked list and thus wastes the space.
Q.14 What is the basic difference between static hashing and dynamic hashing?
AU : May-13, 15, Dec.-14, 15
Ans. : Query optimization is required for fast execution of long running complex
queries.
Q.16 Which cost component are used most commonly as the basis for cost function
AU : May-17
Ans. : To specify fully how to evaluate a query, we need not only to provide the
relational-algebra expression, but also to annotate it with instructions specifying how
to evaluate each operation. This annotated structure is called query execution plan.
UNIT - V
Syllabus
Distributed Databases : Architecture, Data Storage, Transaction Processing - Object-based
Databases : Object Database Concepts, Object-Relational features, ODMG Object Model, ODL,
OQL - XML Databases : XML Hierarchical Model, DTD, XML Schema, XQuery - Information
Retrieval: IR Concepts, Retrieval Models, Queries in IR systems.
Contents
5.1 Architecture of Distributed Databases...................... May-03,14,16,17,
................................................................................. Dec.-04,07,16,17,................ Marks 16
5.2 Data Storage ............................................................ Dec.-04, May-06, .................. Marks 8
5.3 Transaction Processing............................................ Dec.-12,14, May-18, ............. Marks 8
5.4 Object Database Concepts ...................................... May-18 ................................ Marks 13
5.5 Object Relational Features....................................... Dec.-16,............................... Marks 13
5.6 ODMG Object Model
5.7 XML Hierarchical Mode
5.8 DTD .......................................................................... Dec.-16................................ Marks 15
5.9 XML Schema
5.10 XQuery ..................................................................... May-17 .................................. Marks 5
5.11 IR Concepts
5.12 Retrieval Models
5.13 Queries in IR Systems
5.14 Two Marks Questions with Answers
(5 - 1)
Database Management Systems 5-2 Advanced Topics
The homogeneous databases are kind of database systems in which all sites have
identical software running on them. Refer Fig. 5.1.2
In this system, all the sites are aware of the other sites present in the system and
they all cooperate in processing user’s request.
Each site present in the system, surrenders part of its autonomy in terms of right
to change schemas or software.
The homogeneous database system appears as a single system to the user.
(2) Heterogeneous Databases
The heterogeneous databases are kind of database systems in which different sites
have different schema or software. Refer Fig. 5.1.3.
Database Management Systems 5-4 Advanced Topics
University Questions
1. What are the various features of distributed database versus centralized database system?
AU : Dec.-17, Marks 6, May-17, Marks 8
3. Explain about distributed databases and their characteristics, functions and advantages and
disadvantages. AU : Dec.-07, May-14, Marks 8, May-16, Marks 16
(1) Replication : System maintains multiple copies of data, stored in different sites,
for faster retrieval and fault tolerance.
(2) Fragmentation : Relation is partitioned into several fragments stored in distinct
sites.
Database Management Systems 5-5 Advanced Topics
R101 55 Pune
R102 66 Chennai
R103 77 Mumbai
Fig. 5.2.1 Student Table
Horizontal Fragmentation 1 :
SELECT * FROM Student WHERE Marks >50 AND City=’Pune’
We will get
R101 55 Pune
Horizontal Fragmentation 2 :
SELECT * FROM Student WHERE Marks >50 AND City=’Mumbai’
We will get
R103 77 Mumbai
Vertical Fragmentation 1 :
SELECT * FROM RollNo
R101
R102
R103
Vertical Fragmentation 2 :
SELECT * FROM city
Pune
Chennai
Mumbai
University Questions
1. What are data fragmentations ? Explain various approaches for fragmenting a relation with example.
AU : May-06, Marks 6
2. Explain in detail various approaches used for storing a relation in distributed databases.
AU : Dec.-04, Marks 8
Database Management Systems 5-7 Advanced Topics
In distributed system transaction initiated at one site can access or update data at other
sites. Let us discuss various basic concepts used during transaction processing in
distributed systems –
Local and Global Transactions :
Local transaction Ti is said to be local if it is initiated at site Si and can access or update
data at site Si only.
Global transaction Ti initiated by site Si is said to be global if it can access or update
data at site Si, Sj,Sk and so on.
Coordinating and Participating Sites :
The site at which the transaction is initiated is called coordinating site. The
participating sites are those sites at which the sub-transactions are executing. For example
– If site S1 initiates the transaction T1 then it is called coordinating site. Now assume that
transaction T1(initiated at S1) can access site S2 and S3. Then sites S2 and S3 are called
participating sites.
To access the data on site S2, the transaction T1 needs another transaction T12 on site
S2 similarly to access the data on site S3,the transaction T2 needs some transaction say T13
on site S3. Then transactions T12 and T13 are called sub-transactions. The above described
scenario can be represented by following Fig. 5.3.1.
Fig. 5.3.1
Transaction Manager :
Fig. 5.3.3
Fig. 5.3.4
Fig. 5.3.5
Phase 2 : Recoding Decision Phase
T can be committed of Ci received a ready T message from all the participating
sites : otherwise T must be aborted.
Coordinator adds a decision record, <commit T> or <abort T>, to the log and forces
record onto stable storage. Once the record stable storage it is irrevocable (even if
failures occur)
Failure of Site
There are various cases at which failure may occur
(1) Failure of Participating Sites
If any of the participating sites gets failed then when participating site Si recovers,
it examines the log entry made by it to take the decision about executing
transaction.
o If the log contains <commit T> record: participating site executes redo (T)
o If the log contains <abort T> record: participating site executes undo (T)
o If the log contains <ready T> record: participating site must consult
Coordinating site to take decision about execution of transaction T.
If T committed, redo (T)
If T aborted, undo (T)
If the log of participating site contains no record then that means Si gets failed
before responding to Prepare T message from coordinating site. In this case it
must abort T
(2) Failure of Coordinator
If coordinator fails while the commit protocol for T is executing then participating
sites must take decision about execution of transaction T :
i) If an active participating site contains a <commit T> record in its log, then T must
be committed.
ii) If an active participating site contains an <abort T> record in its log, then T must
be aborted.
iii) If some active participating site does not contain a <ready T> record in its log,
then the failed coordinator Ci cannot have decided to commit T. Can therefore
abort T.
iv) If none of the above cases holds, then all participating active sites must have a
<ready T> record in their logs, but no additional control records (such as <abort
T> of <commit T>). In this case active sites must wait for coordinator site Ci to
recover, to find decision.
Two phase locking protocol has blocking problem.
What is blocking Problem ?
It is a stage at which active participating sites may have to wait for failed coordinator
site to recover.
The solution to this problem is to use three phase locking protocol.
Database Management Systems 5 - 12 Advanced Topics
Phase 1 : This phase is similar to phase 1 of two phase protocol. That means
Coordinator site Ci asks all participants to prepare to commit transaction Ti. The
coordinator then makes the decision about commit or abort based on the response
from all the participating sites.
Phase 2 : In phase 2 coordinator makes a decision as in 2Phase Commit which is
called the pre-commit decision <Pre-commit, T> , and records it in multiple (at
least K) participating sites.
Phase 3 : In phase 3, coordinator sends commit/abort message to all participating
sites.
Under three phase protocol, the knowledge of pre-commit decision can be used to
commit despite coordinator site failure. That means if the coordinating site in case
gets failed then one of the participating site becomes the coordinating site and
consults other participating sites to know the Pre-commit message which they
possess. Thus using this pre-commit message the decision about commit/abort is
taken by this new coordinating site.
This protocol avoids blocking problem as long as less than k sites fail.
Advantage of Three Phase commit protocol
(1) It avoid blocking problem
Drawbacks of Three Phase commit protocol
(1) The overhead is increased.
University Questions
1. Explain with diagrammatic illustration the architecture of a distributed database management system.
AU : May-18, Marks 5
2. Write short note on distributed transactions. AU : Dec.-14, Marks 8
3. Discuss in detail about two phase commit protocol. AU : Dec.-12, May 16, Marks 8
Database Management Systems 5 - 13 Advanced Topics
The core object oriented Data model consists of following object oriented
concepts –
o Object and Object Identifier : Any real world entity is modeled as object.
Associated with each object there is an unique ID maintained which is called
object identifier(OID). Object identifiers used to uniquely identify objects.
Object identifiers are unique. No two objects have the same identifier. each
object has only one object identifier.
o Attribute and method : Every object has state and behavior. The state is a set
of values for attributes of the object. The behavior is set of methods. For
example – Object named circle has state radius and computing its area is a
behavior.
o Class : Class is a means of grouping all the objects which share the same set
of attributes and methods. An object must belong to only one class as an
instance of that class (instance-of relationship). A class is similar to an abstract
data type.
o Class Hierarchy and Inheritance : The class hierarchy is used to represent the
Inheritance property of object oriented paradigm. The inheritance is a
mechanism in which a new class is derived from existing class.
5.4.2.1 Object Structure
Object is a fundamental unit of object oriented programming in which data value
and code is encapsulated in a single unit.
Conceptually all interactions among the objects and rest of the system are carried
out by using messages. Thus the interface among various objects and rest of the
system is messages. Messages can be implemented as procedure invocations.
In general, an object has associated with it:
o A set of variables that contain the data for the object. The value of each
variable is itself an object.
o A set of messages to which the object responds.
o A set of methods, each of which is a body of code to implement each
message; a method returns a value as the response to the message.
For example – the object Circle can have methods such as compute_area()
5.4.2.2 Object Classes
Similar objects are grouped into classes. In other words, object is an instance of a
class. For example
class Student {
/*variables*/
Int RollNo;
String Name;
String address;
/*Messages*/
String get_name();
String get_address();
Int get_RollNo();
}
Methods can be defined separately. For example
int get_rollNo()
{
Return RollNo;
}
5.4.2.3 Inheritance
In object oriented database schema, there can be large number of classes. Some classes
are similar and can be derived from old existing classes. The classes are placed into
specialization or Is-a hierarchy.
For example – Consider following ER model -
The class hierarchy can be defined in pseudo-code in which the variables associated
with each class are as follows
Database Management Systems 5 - 16 Advanced Topics
class Person
{
String name;
String address;
}
class Student isa Person //Inheritance: Deriving child class
{
int RollNo;
String Course_taken;
}
class Employee isa Person
{
date join-date;
int salary;
}
class Teacher isa Employee
{
String Subject;
}
class Accountant isa Employee
{
int office_no;
}
The keyword isa is used to indicate that a class is a specialization of another class.
The specialization of a class are called subclasses. For example - Employee is a
subclass of Person. Conversely, Person is a superclass of class Employee.
Class hierarchy and inheritance of properties from more general classes.
The important benefit of using inheritance is code-reusability. This can be
achieved by means of substitutability property.
Substitutability means any method of a class, say person, can be invoked equally
well with any object belonging to any subclass, such as subclass Teacher of
Person.
5.4.2.4 Multiple Inheritance
In most cases, tree-structured organization of classes is adequate to describe
applications. In such cases, all superclasses of a class are ancestors of descendants
of another in the hierarchy. However, there are situations that cannot be
represented well in a tree-structured class hierarchy.
To avoid the conflicts between two occurrences, we use multiple inheritance.
Database Management Systems 5 - 17 Advanced Topics
o Name : The user supplied name is used for identity. For example file name
can be used for identity of object.
o Built-in : A notion of identity is built-into the data model or programming
languages, and no user-supplied identifier is required.
The links between classes must be interpreted as is-part-of, rather than the is-a
interpretation of links in an inheritance hierarchy.
Containment allows data to be viewed at different granularities by different users.
For example - document can be viewed by a reader for simply reading purpose,
the size and style of paragraphs can be viewed by DTP operator or proof editor
for editing purpose.
The containment hierarchy can be used to find all object contained in document
object.
In certain applications, an object may be contained in several objects. In such
cases, the containment relationship is represented by a DAG rather than by a
hierarchy.
5.4.3 Object Oriented Languages
The expression of object-orientation can be done in one of two ways –
1. The concepts of object-orientation can be used purely as a design tool, and are
encoded into, For example - a relational database. Example - ER modeling.
2. The concepts of object-orientation are incorporated into a language that is
used to manipulate the database.
There are several possible languages into which the concepts can be integrated.
o Object-relational systems - Add complex types and object-orientation to
relational language.
o Persistent programming languages - Extend object-oriented programming
language to deal with databases by adding concepts such as persistence and
collections
5.4.4 Persistent Programming Languages
Persistent Programming languages allow objects to be created and stored in a
database, and used directly from a programming language.
The persistent data is a data that continue to exist even after the program that
created it has terminated.
The persistent programming language allows data to be manipulated directly
from the programming language and there is no need to go through SQL.
Using persistent programming language, the format changes are carried out
transparently by system.
Without a persistent language format changes becomes a burden on the
programmer. As it may cause more code to be written.
This language allows objects to be manipulated in-memory.
Database Management Systems 5 - 20 Advanced Topics
Drawbacks
If the programming errors occur then it might damage the database system.
Due to complexity of programming languages, high level of optimization can not
be achieved.
It does not support declarative query as well as relational database.
5.4.4.1 Persistence of Object
Persistence of object can be achieved by following approaches –
(1) Persistence by Class : Declare all objects of the class to be persistent to achieve the
persistence of the objects.
(2) Persistence by Creation : Introduce new syntax to create persistent objects.
(3) Persistence by marking : An object that is to persist beyond program execution is
marked as persistent before program termination.
(4) Persistence by Reference : One or more objects can be explicitly declared as (root)
persistent objects. All other objects are persistent if they are referred, directly or
indirectly, from a root persistent object.
Thus it is easy to make the entire data structure persistent by merely declaring the root
of the structure as persistent.
5.4.4.2 Object Identity and Pointers
A persistent object is assigned a persistent object identifier.
There are several degrees of permanence of object identity
o Intraprocedure : Identity persists only during the execution of a single
procedure, e.g., local variables within procedures.
o Intraprogram : Identity persists only during the execution of a single program
or query, e.g., global variables in programming languages, and main memory
or virtual memory pointers.
o Interprogram : Identity persists from one program execution to another, e.g.,
pointers to file system data on disk but may change if the way data is stored
in the file system is changed.
o Persistent : Identity persists not only among program executions but also
among structural reorganizations of the data. This is the persistent form of
identity required for object-oriented systems.
o The source code that implements the methods should be stored in the
database as a part of the schema, along with type definition.
o The data is stored individually for each object.
University Question
1. Explain the necessary characteristics a system must satisfy to be considered as an object oriented
database management system. AU : May-18, Marks 13
The relational model with object database enhancement is called as object relational
model.
Following are some object database features that can be included in SQL
(1) For specifying the complex objects some type constructors can be added. For
example – row type corresponds to tuple in the schema, array type is for
specifying the collection. Some other types of constructors are set, list, and bag.
(2) The mechanism of for specifying the object identity through the use of reference
type is be included.
(3) The relationships can be specified using reference.
(4) Encapsulation of operation is provided through the mechanism of user defined
types(UDT). Similarly the concept of user defined routines allow the definition of
general methods.
(5) The mechanism of inheritance can be provided using the keyword UNDER.
Example 5.5.1 Suppose that you have been hired as a consultant to choose a database system
for your client's application. For each of the following, applications, state what type of
database system (relational, persistent programming language-based OODB, object
relational; do not specify a commercial product) you would recommend. Justify your
recommendation.
(i) A computer-aided design system for a manufacturer of airplanes.
(ii) A system to track contributions made to candidates for public office.
(iii) An information system to support the making of movies. AU : Dec.-16, Marks 13
Database Management Systems 5 - 22 Advanced Topics
Solution :
5.6.1 ODL
(1) Declaring Classes Using ODL
For declaring classes using ODL we need to use –
For example -
interface Student(Key RollNo)
{
attribute integer RollNo;
attribute string Name;
attribute string address;
attribute string course_id;
};
5.6.2 OQL
Object Query Language (OQL) is a query language standard for object-oriented
databases modeled after SQL.
The following rules apply to OQL statements
(1) All complete statements must be terminated by a semi-colon.
(2) A list of entries in OQL is usually separated by commas but not terminated
by a comma(,).
(3) Strings of text are enclosed by matching quotation marks.
Basic From of OQL: SELECT, FROM, WHERE
Syntax
SELECT <list of values>
FROM <list of collections and variable assignments>
WHERE <condition>
Where the SELECT clause extracts those elements of a collection meeting a specific
condition. By using the keyword DISTINCT duplicated elements in the resulting
collection get eliminated. Collections in FROM can be either extents (persistent names -
sets) or expressions that evaluate to a collection (a set).
Example : Give the names of people who are older than 30 years old:
SELECT SName: p.name
FROM p in People
WHERE p.age > 30
DOT notations and Path expressions
We use the dot notation and path expressions to access components of complex values.
For example
ta.salary -> real
t.students -> set of tuples of type tuple(name: string, fee: real) representing students
t.salary -> real
Database Management Systems 5 - 25 Advanced Topics
(1) Element
o In XML the basic entity is element. The elements are used for defining the
tags. The elements typically consist of opening and closing tag. Mostly only
one element is used to define a single tag.
o The syntax of writing any element for opening tag is
<element name>
o The syntax of writing any closing element for closing tag is
</element name >
o For example -
o <Student> I am learning Computer Engineering </Student>
o Every XML must have a single root element.
o The other elements are called the container elements. There can be one or
more elements inside the container elements such elements are called the
child elements.
o An empty tag can be defined by putting a / (forward slash) before closing
bracket.
o For example -
<Student/>
Database Management Systems 5 - 26 Advanced Topics
XML Document
<Person> Root Element
<Personal-Info> Container Element
<Name>Anuradha</Name> Child Element
<City> Pune</City>
</Personal-Info>
<Hobby>
<first>Reading</first>
<second>Programming</second>
<third>Singing</third>
</Hobby>
</Person>
The above XML document can be represented as
Example
Consider following xml document -
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE student [
<!ELEMENT student (name,address,std,marks)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT address (#PCDATA)>
<!ELEMENT std (#PCDATA)>
<!ELEMENT marks (#PCDATA)>
]>
<student>
<name> Anand</name>
<address>Pune</address>
<std>Second</std>
<marks>70 percent</marks>
</student>
Output
Example 5.8.1 : Create a DTD for a catalog of four stroke motorbikes, where each motor bike has
the following child elements -make, model, year, colo, engine, chasis number and
accessories. The engine element has the child elements engine number, number of cylinders,
type of fuel. The accessories element has the attributes like disc brake,auto-start and radio,
each of which is required and has the possible values yes and no.
Solution :
automobile.dtd
<!ELEMENT Catalog (MotorBike)*>
<!ELEMENT MotorBike (make, model, year,color, engine,chasis-num,accessories ) >
<!ELEMENT make (#PCDATA)>
<!ELEMENT model (#PCDATA)>
<!ELEMENT year (#PCDATA)>
Database Management Systems 5 - 29 Advanced Topics
Example 5.8.2 : Give the DTD or XML Schema for an XML representation of the following
nested-relational schema :
Emp = (ename, ChildrenSet setof(Children), SkillsSet setof(Skills))
Children = (name, Birthday)
Birthday = (day, month, year)
Skills = (type, ExamsSet setoff(Exams))
Exams = (year, city). AU : Dec.-16, Marks 15
3. The DTD cannot define the type of data contained within the XML document.
Hence using DTD we cannot specify whether the element is numeric, or string or
of date type.
4. There are some XML processor which do not understand DTDs.
3) Then comes xs:element which is used to define the xml element. In above case the
element Student is of complex type who have four child elements : name, address,
std and marks. All these elements are of simple type string.
Step 2 : Now develop XML document in which the desired values to the XML
elements can be given. I have named this file as MySchema.xml
XML Document [MySchema.xml]
<?xml version="1.0" encoding="UTF-8"?>
<Student xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="StudentSchema.xsd">
<name>Anand</name>
<address>Pune</address>
<std>Second</std>
<marks>70 percent</marks>
</Student>
Script Explanation :
1) In above XML document, the first line is typical in which the version and encoding
is specified.
2) There is xmlns:xsi attribute of document which indicates that XML document is an
instance of XML schema. And it has come from the namespace
“http://www.w3.org/2001/XMLSchema-instance.”
3) To tie this XML document with the some Schema definition we use the attribute
xsi:noNamespace SchemaLocation. The value that can be passed to this attribute
is the name of xsd file “StudentSchema.xsd”.
4) After this we can define the child elements.
Step 3 : See the output in the browser window as follows -
Database Management Systems 5 - 32 Advanced Topics
Advantages of schema
1. The schemas are more specific.
2. The schema provide the support for data types .
3. The XML schema is written in XML itself and has a large number of built in and
derived types.
4. The XML schema is the W3C recommendation. Hence it is supported by various
XML validator and XML processors.
Disadvantages of schema
1. The XML schema is complex to design and hard to learn.
2. The XML document cannot be if the corresponding schema file is absent.
3. Maintaining the schema for large and complex operations sometimes slows down
the processing of XML document.
Difference between DTD and XML Schema
Sr. No. DTD Schema
1. DTD are basic for defining the structural Schema are specific for complex operations
components of XML. while defining the structural components of
XML.
2. DTD does not have ability to define the Schemas have ability to define the type of
type of data. data.
3. Simple and small applications can be built Large number of complex applications can
using DTD be built using schema.
4. DTDs allow inline definitions XML schema does not allow inline
definitions.
XQuery is used to query the XML database. It is just similar to SQL to database.
XQuery is supported by all major databases.
Its main task is to get information out of XML databases.
Example of XQuery
Step 1: Consider following simple XML document(this document is saved by name
students.xml) that stores student information
<students>
<student>
Database Management Systems 5 - 33 Advanced Topics
<name>AAA</name>
<age>15</age>
<student>
<name>BBB</name>
<age>16</age>
<student>
<name>CCC</name>
<age>17</age>
<student>
<name>DDD</name>
<age>18</age>
</student>
<student>
<name>EEE</name>
<age>19</age>
<student>
<name>FFF</name>
<age>20</age>
</students>
Step 2 : We can write the XQuery as follows –
for $x in doc("students.xml")/students/student Here x is a variable
where $x/age>16 Query
return $x/name returns names of students satisfying the query
This query will return the names of students having the age >16.
Step 3 : The result will be –
CCC
DDD
EEE
FFF
University Question
1. Explain the process of querying XML data with an example. AU : May-17, Marks 5
Unstructured data is more like human language. It doesn’t fit nicely into
relational databases. For example - emails, text documents (Word docs, PDFs,
Database Management Systems 5 - 35 Advanced Topics
etc.), social media posts, videos, audio files, and images all of these represent
unstructured form of data.
Useful information is locked up in all unstructured data. The information in
emails and social media, for example, holds important insight that can be used for
operational intelligence, marketing intelligence, and more.
There are various ways to retrieve useful information from unstructured data. The
most popularly used technique is use of queries.
Definition of Information Retrieval : Information Retrieval(IR) is the process of
retrieving documents from a collection in response to a query submitted by a user.
Concept of Query
Information Retrieval system is beyond database systems that do not limit its user
to use specific query language, nor do they expect its user to know the structure of
schema.
User can make use of free form of search request which is called as query to
retrieve information from IR system. This query is also called as keyword search
query.
Characteristics of IR Systems
The IR system is characterized at different levels : i) By types of users ii) Types of
data iii) Types of information need.
Types of Users :
o There are two types of users – expert user and layperson user.
o The expert user is a kind of user who is searching for specific information that
is clear in his mind. He/She also knows the form of query to be submitted to
get the desired information. For example – User who wants to get the
information about particular book.
o The layperson user is a user with generic information need. This type of user
can not create highly relevant queries for search.For example – Student
searching for information on particular topic.
Types of Data :
o Search systems can be tailored to specific types of data.
o For example, the problem of retrieving information about a specific topic may
be handled more efficiently by customized search systems that are built to
collect and retrieve only information related to that specific topic.
o These domain specific or vertical IR systems are not as large as generic
WWW. The information can be retrieved efficiently from such IR systems.
Database Management Systems 5 - 36 Advanced Topics
5. Results are based on exact matching. Results are based on approximate matching.
o Based on this matching, the ranked list of documents containing these words
is presented to the user.
(2) Semantic Approach :
o In this approach the knowledge base technique of information retrieval is
used.
o The syntactical, lexical, sentential, discourse based and pragmatic level of
words are used to prepare knowledge base for understanding.
o In practice, semantic approaches also apply some form of statistical analysis
to improve retrieval process.
Generic IR Framework
Following Fig. 5.11.1 represents the various stages involved in an IR processing.
The stages at the left – such as preprocessing, modeling, indexing are all offline process
which prepare a set of documents for efficient retrieval.
The steps involved in query formation, query processing, searching mechanism,
document retrieval, and relevance feedback are shown on the right.
5.12 Retrieval Models
(1) Boolean Model
In this model, the queries are using operators such as AND, OR , NOT.
Database Management Systems 5 - 39 Advanced Topics
|D|
Wt,d = tft, d log
|{d D|t d}|
and
tft, d is term frequency of term t in document d (a local parameter)
|D|
log is inverse document frequency (a global parameters). |D| is
|{d D|t d}|
the total number of documents in the document set; |{d D| t d}| is the
number of documents containing the term t.
(3) Probabilistic Model
The vector space model assumes that those documents closer to the query in
cosine space are more relevant to the query vector.
In the probabilistic model, a more concrete and definitive approach is taken:
ranking documents by their estimated probability of relevance with respect to the
query and the document.
In the probabilistic framework, the IR system has to decide whether the
documents belong to the relevant set or the non-relevant set for a query.
To make this decision, it is assumed that a predefined relevant set and
nonrelevant set exist for the query, and the task is to calculate the probability that
the document belongs to the relevant set and compare that with the probability
that the document belongs to the nonrelevant set.
Given the document representation D of a document, estimating the relevance R
and nonrelevance NR of that document involves computation of conditional
probability P(R|D) and P(NR|D). These conditional probabilities can be
calculated using Bayes’ Rule
o P(R|D) = P(D|R) × P(R)/P(D)
o P(NR|D) = P(D|NR) × P(NR)/P(D)
(4) Semantic Model
In semantic models, the process of matching documents to a given query is based
on concept level and semantic matching instead of index term (keyword)
matching.
This allows retrieval of relevant documents that share meaningful associations
with other documents in the query result.
Semantic approaches include different levels of analysis, such as morphological,
syntactic, and semantic analysis, to retrieve documents more effectively.
Database Management Systems 5 - 41 Advanced Topics
In morphological analysis the parts of query such as noun, verbs, adjective and so
on are analyzed.
In syntactical analysis complete phrases in the document are parsed and then
analyzed.
In semantic analysis to resolve the ambiguities in the words the synonyms are
used. For this analysis complex knowledgebase and information retrieval heuristic
is used.
These systems often require techniques from artificial intelligence and expert
systems. Knowledge bases like Cyc and WordNet have been developed for use in
knowledge-based IR systems based on semantic models.
5.13 Queries in IR Systems
Various types of queries used for information retrieval are as given below -
(1) Keyword Queries
o This is the simplest and most commonly used form of queries.
o Various keywords are used for searching the desired documents. These
keywords are combined using AND, OR operators.
o The query containing keyword “Java Programming” will return the
documents containing the words “Java” and “Programming”
o Some systems remove the most commonly used words such as a , the , of and
so on.
o All retrieval models support for keyword queries.
(2) Boolean Queries
o Some IR systems allow using of boolean operators such as AND, OR, NOR
for formulation of query.
o AND(or dot operator) means both the terms are required, OR(+ operator)
means either term1 or term2 must be present and NOT(–) means list the
documents not containing particular term.
o Complex Boolean queries can be built out of these operators and their
combinations, and they are evaluated according to the classical rules of
Boolean algebra.
o No ranking is possible, because a document either satisfies such a query (is
“relevant”) or does not satisfy it.
o A document is retrieved for a Boolean query if the query is logically true as an
exact match in the document.
Database Management Systems 5 - 42 Advanced Topics
Ans. : (1) Replication : System maintains multiple copies of data, stored in different sites,
for faster retrieval and fault tolerance.
(2) Fragmentation : Relation is partitioned into several fragments stored in distinct
sites.
Database Management Systems 5 - 43 Advanced Topics
Q.3 What are various fragmentations? State various fragmentations with example.
AU : Dec.-17
Ans. : There are two types of fragmentations – horizontal fragmentation and vertical
fragmentation
Example – Refer section 5.2.2.
Q.4 What are the advantages of distributed databases ? AU : Dec.-04, May-08
Ans. :
(1) There is fast data processing as several sites participate in request processing.
(2) Reliability and availability of this system is high.
(3) It possess reduced operating cost.
(4) It is easier to expand the system by adding more sites.
(5) It has improved sharing ability and local autonomy.
Q.4 List out the reasons for development of distributed databases. AU : May-06
Each site provides part of its autonomy in terms Each site maintains its own right to change the
of right to change or software. schema or software.
Due to same schema- there is no problems in Due to different schemas, there are lot of
query processing. problems in query processing.
Ans. : • The XML schemas are used to represent the structure of XML document.
The goal or purpose of XML schema is to define the building blocks of an XML
document. These can be used as an alternative to XML DTD.