MCA07
MCA07
FUNDAMENTALS OF DATABASE
MANAGEMENT SYSTEM
CONTENTS
No part of this publication which is material protected by this copyright notice may be produced or
transmitted or utilized or stored in any form or by any means now known or hereinafter invented,
electronic, digital or mechanical, including photocopying, scanning, recording or by any information
storage or retrieval system, without prior written permission from the KKHSOU.
Printed and published by Registrar on behalf of the Krishna Kanta Handiqui State Open University.
Unit - 1 introduces file structure and organization. Some basic concepts like data,
information, field, record and files are described in this unit. This unit will help you to
understand operation on files and different file organization techniques.
Unit - 2 is on database management system. This unit gives you the idea of DBMS, its
advantages and disadvantages, DBMS users, DBMS language etc.
Unit- 3 concentrates on data models. Data Models like Object based models, Relational
models, Network models, Hierarchical models etc. are discussed in this unit.
Unit - 5 and Unit - 6 are on structured query language. With these units learners will be
able to write SQL systax using various SQL commands.
Unit - 7 deals with relational database design. Concepts of normalization and various
normal forms are described in this unit
Each unit of this course includes some along-side boxes to help you know
some of the difficult, unseen terms. Some “EXERCISES” have been included to help
you apply your own thoughts. You may find some boxes marked with: “LET US KNOW”.
These boxes will provide you with some additional interesting and relevant information.
Again, you will get “CHECK YOUR PROGRESS” questions. These have been designed
to make you self-check your progress of study. It will be helpful for you if you solve the
problems put in these boxes immediately after you go through the sections of the units
and then match your answers with “ ANSWERS TO CHECK YOUR PROGRESS”
given at the end of each unit.
MASTER OF COMPUTER APPLICATIONS
Fundamentals of Database Management System
DETAILED SYLLABUS
*****
UNIT 1 : FILE STRUCTURE AND ORGANIZATION
UNIT STRUCTURE
1.2 INTRODUCTION
Data and infomation are closely related and are often used
interchangeably. Data may be defined as a known fact that can be recorded
and that have implicit meaning. It can be anything like name of a person or
a place of stay or a number representing roll number of student, ticket
number of passenger, passport number of an immigrant, bill number for a
payee, credit card number for a holder, identification number for an employee
etc. Data is the name given to basic facts and entities such as names and
numbers. There are uncountable examples of data such as weights, prices,
costs, numbers of items sold or purchased, employee names, product
names, addresses, tax codes, registration marks etc.
Fig. 1.1
tables, printed documents, pay slips, receipts, reports etc. The information
is obtained by assembling items of data into a meaningful forms. For example,
marks obtained by students and their roll numbers form data, the report
card/sheet is the information. Other forms of information are pay-slips,
schedules, reports, worksheet, bar charts, invoices and accounts returns
etc. It may be noted that information containing wisdom is known as
knowledge or it is applied form of information. Information becomes utility
oriented only when it is applied in particular area of expertise or specialization.
4th record
File
Field content
Fig. 1.2 : Concept of field, records and files
For example, in the following figure 1.2, a srudent result file is shown
in tabular format. It contains ten records of students and each record has
four related fields namely Roll_No, Name, Percentage, Division. The
collection of informations about a particular student in one line or row of the
Fundamentals of Database Management System 8
table is an example of record. The collection of result information for all the
students (all columns and rows), i.e., the entire table is an example of file.
Each record in a file may contain many fields, but the value in a certain field
may uniquely determine the record in the file. Such a field is known as key
field. In case of student’s record in fig(1.2), Roll_No is the key field because
it is unique for each student. Similarly, Part_no in a stock file, Account_no
in a bank customer’s file are all examples of key fields. A file can be of
following two types:
Master File
and Transaction File
A master file contains records of relatively permanent data. For
example, the name, roll number, sex, date of birth, address of a student
would appear on a student master file.
A transaction file is a file in which the current data are stored for
subsequent processing usually in combination with a master file. Transaction
file contains the records that are used to update the records of the master
file. For example, a transaction file may contain the roll number of students
whose home address has recently changed. This file will used to update
the record of the student whose roll number matches with the transaction
file record. The record of the master file will thus be updated as and when
necessary. It is very important that the transaction file should contain records
with the fields and their names in the same order and size and type of the
master file. Otherwise the process of updating of the master file will cause
an error.
where most, if not all, of the data records need to be processed one
after another. In these applications, data records for every employee
or customer needs to be processed at scheduled intervals (in this
case monthly). However, while working with a sequential-access
device, if an address is required out of order, it can only be reached
by searching through all those addresses, which are stored before
it. For instance, data stored at the last locations cannot be accessed,
until all preceding locations in the sequence have been traversed.
This is analogous to a music tape cassette. Suppose you like to
hear a particular song which is in 8th position in the cassette. For
that you can “fast forward” the first seven songs. Although, not fully
played, the 7 songs are still accessed. Magnetic tape is an example
of sequential access storage device. The main drawbacks of this
type of organization are:
Sequential file are not suitated for on-line enquiry where up-
to-date information is required.
Information on the file is not always current.
Addition and deletion of records are not simple task.
Searching information in a sequential file can be a very slow
process. For any search operation, we need to start reading
a sequential file from the beginning and continue till the end,
or untill the desired record is found, whichever is earlier. This
is both time-consuming and cumbersome.
Some advantages of sequential file organization are:
File design is simple
Low-cost file medium. Tape can be used.
Data may be defined as a known fact that can be recorded and that
have implicit meaning.
Information is processed and organised data. It can be defined as
collection of related data that when put together, communicate
meaningful and useful message to a recipient who uses it.
The smallest piece of meaningful information is called a field.
A record is collection of field values or data items of a given entity.
A file is organized to ensure that records are available for processing.
There are three types of organization used in computer to store
records. They are: Sequential access, direct access and index-
sequential access organization.
a) Sequential organization means storing records in contiguous
blocks according to a key field. Magnetic tape is an example of
sequential access storage device.
b) Indexed-sequential organization stores records sequentially but
uses an index to locate records. Records are related through
chaining using pointers.
1. i) True, ii) True, iii) False, iv) True, v) False, vi) True,
vii) True, viii) False, ix) True, x) False
UNIT STRUCTURE
In the previous unit, we have learnt about the basic idea of data,
information, fields and records. In addition, concepts of files and basic file
organization techniques are also introduced. All these are the elementary
concepts relating to the database system.
In this unit, we will learn the concept of database and DBMS. We
will also discuss the three-tier database architecture and different types of
DBMS languages. Moreover, responsibilitties of database administrator will
also be discussed at the end of this unit.
Student Financial
Administration Management
Course Faculty
Administration Administration
Student Financial
Administration Management
DATA
BASE
Course Faculty
Administration Administration
EXTERNAL SCHEMA
External level
User User User …… User view n
Conceptual level
CONCEPTUAL SCHEMA
File File
File
Conceptual level
Internal level
Stored_Item Length = 40
Number Type = Byte (6), Offset = 0, Index = Ix
Name Type = Byte (20), Offset = 6
Price Type = Byte (8), Offset = 26
ReOrderQuantity Type = Byte (4), Offset = 34
EMPLOYEE
UNIT STRUCTURE
3.2 INTRODUCTION
There have been many data models proposed in the literature. They
fall into three broad categories:
Record Based Data Models
Object Based Data Models
Physical Data Models
Record Based Data Models: A record based data model is used to specify
the overall logical structure of the database. In this model the database
consists of a no. of fixed formats of different types.Each record type defines
a fixed no. of fields having a fixed length.
There are 3 principle types of record based data model. They are:
1. Hierarchical data model.
2. Network data model.
3. Relational data model.
Hierarchical Model : The hierarchical data model organizes data in a tree
structure. There is a hierarchy of parent and child data segments. This
structure implies that a record can have repeating information, generally in
the child data segments. Data in a series of records, which have a set of
field values attached to it. It collects all the instances of a specific record
together as a record type. These record types are the equivalent of tables
in the relational model, and with the individual records being the equivalent
of rows. To create links between these record types, the hierarchical model
uses Parent Child Relationships. These are a 1:N mapping between record
types. This is done by using trees, like set theory used in the relational
model, "borrowed" from maths.
For example, an organization might store information about an
employee, such as name, employee number, department, salary. The
organization might also store information about an employee's children,
such as name and date of birth. The employee and children data forms a
hierarchy, where the employee data represents the parent segment and
the children data represents the child segment. If an employee has three
Hierarchical DBMSs were popular from the late 1960s, with the
introduction of IBM's Information Management System (IMS) DBMS, through
the 1970s.
Advantages
Many features form the foundation for current data models
Generated a large installed base of programmers
Who developed solid business applications
Disadvantages
Complex to implement
Difficult to manage
Lacks structural independence
Implementation limitations
Lack of standards (Company vs. Industry or Open)
Network Model : The popularity of the network data model coincided with
the popularity of the hierarchical data model. Some data were more naturally
modeled with more than one parent per child. So, the network model
permitted the modeling of many-to-many relationships in data. In 1971, the
Conference on Data Systems Languages (CODASYL) formally defined the
network model. The basic data modeling construct in the network model is
the set construct. A set consists of an owner record type, a set name, and
a member record type. A member record type can have that role in more
3.7 ER MODEL
ER Diagram Symbols :
Entity Attribute
Relationship
Derived attribute
Fig. 3.2
Fig. 3.3
Fig. 3.4
Fig. 3.5
Fig. 3.6
Binary relationship :
Fig. 3.7
Ternary relationship :
Fig. 3.8
Fig. 3.9
Fig. 3.10
Fig. 3.11
Fig. 3.13
Fig. 3.14
Fig. 3.13
Q.1. Explain the difference between weak and strong entity set.
Q.2. Draw an ER diagram for the following application from the
manufacturing industry:
Each supplier has a unique name.
More than one supplier can be located in the same city.
Each part has a unique part number.
Each part has a colour.
A supplier can supply more than one part.
A part can be supplied by more than one supplier.
A supplier can supply a fixed quantity of each part.
Q.3. Define the concept of aggregation. Give an example where this
concept is useful.
Q.4. Construct an ER Diagram . A General Hospital consists of a number
of specialized wards (such as Maternity, Paediatry, Oncology, etc).
Each ward hosts a number of patients, who were admitted on the
recommendation of their own GP and confirmed by a consultant
employed by the Hospital. On admission, the personal details of
every patient are recorded. A separate register is to be held to store
the information of the tests undertaken and the results of a prescribed
treat ment. A number of tests may be conducted for each patient.
Each patient is assigned to one leading consultant but may be
examined by another doctor, if required. Doctors are spe cialists in
some branch of medicine and may be leading con sultants for a
number of patients, not necessarily from the same ward.
UNIT STRUCTURE
4.2 INTRODUCTION
Fig. 4.1
Fig. 4.2
One of the strengths of the relational model is that, in principle, any
value occurring in two different records belonging to the same table or to
different tables, implies a relationship among those two records. Yet, in order
Fundamentals of Database Management System 68
to enforce explicit integrity constraints, relationships between records in
tables can also be defined explicitly, by identifying or non-identifying parent-
child relationships characterized by assigning cardinality . Tables can also
have a designated single attribute or a set of attributes that can act as a
"key", which can be used to uniquely identify each tuple in the table.
A key that can be used to uniquely identify a row in a table is called
a primary key. Keys are commonly used to join or combine data from two or
more tables. For example, an Employee table may contain a column named
Location which contains a value that matches the key of a Location table.
Keys are also critical in the creation of indexes, which facilitate fast retrieval
of data from large tables. Any column can be a key, or multiple columns can
be grouped together into a compound key. It is not necessary to define all
the keys in advance; a column can be used as a key even if it was not
originally intended to be one.
Relational databases are currently the predominant choice in storing
data like financial records, medical records, personal information and
manufacturing and logistical data.The major advantages of the relational
model over the older data models are its simple data representation and
the ease with which even complex queries can be expressed.
Fig. 4.3
The instance contains six tuples and has five fields. Note that no
two rows are identical. This is a requirement of the relational model,each
relation is defined to be a set of unique tuples or rows. The order in which
the rows are listed is not important.
A relation schema species the domain of each field or column in the
relation instance. These domain constraints in the schema specify an
important condition that we want each instance of the relation to satisfy:
The values that appear in a column must be drawn from the domain
associated with that column. Thus, the domain of a field is essentially the
type of that field, in programming language terms, and restricts the values
that can appear in the field.
The degree of a relation is the number of fields. The cardinality of
a relation instance is the number of tuples in it. In the figure above(Fig.4.3)
the degree of the relation (the number of columns) is five, and the cardinality
of this instance is six. A relational database is a collection of relations with
distinct relation names. The relational database schema is the collection of
schemas for the relations in the database. For example, a University
database with relations called Students, Faculty, Courses, Rooms, Enrolled,
Fundamentals of Database Management System 70
Teaches.
An instance of a relational database is a collection of relation
instances, one per relation schema in the database schema; of course,
each relation instance must satisfy the domain constraints in its schema.
For the table above we have a student_id that uniquely identifies the
students in a student table. This would be a candidate key. But in the same
table we might have the student’s FirstName and Last Name that also,
when combined, uniquely identify the student in a student table. These would
both be candidate keys.
In order to be eligible for a candidate key it must pass certain criteria.
It must contain unique values
It must not contain null values
It contains the minimum number of fields to ensure uniqueness
It must uniquely identify each record in the table
Once your candidate keys have been identified you can now select
one to be your primary key .
Set Operations :
The standard operations on sets are also available in relational algebra.
These are : union (), intersection (), set-difference (–), and cross-product
(X).
Union : R S returns a relation instance containing all tuples that
occur in either relation instance R or relation instance S (or both). R and S
must be unioncompatible, and the schema of the result is defined to be
identical to the schema of R. Two relation instances are said to be union-
compatible if the following conditions hold:
a) they have the same number of the fields, and
b) corresponding fields, taken in order from left to right, have the
same domains. For example :
Fig. 4.4
Fig. 4.5
Fig. 4.6
Fig. 4.7
Natural Join :
Invariably the JOIN involves an equality test, and thus is often described as
an equi-join. Such joins result in two attributes in the resulting relation having
exactly the same value. A `natural join' will remove the duplicate attributes.
Fig.4.8
In most systems a natural join will require that the attributes have
the same name to identify the attribute to be used in the join.
Fundamentals of Database Management System 82
This may require a renaming mechanism.
If you do use natural joins make sure that the relations do not
have two attributes with the same name by accident.
Outer Joins :
Much of the data is lost when applying a join to two relations. In some cases
this lost data might hold useful information. An outer join retains the
information that would have been lost from the tables, replacing missing
data with nulls.
The result of a join (or inner join) consists of tuples formed by
combining matching tuples in the two operands, an outer join contains those
tuples and additionally some tuples formed by extending an unmatched
tuple in one of the operands by "fill" values for each of the attributes of the
other operand.
There are three forms of the outer join, depending on which data is
to be kept.
LEFT OUTER JOIN - keep data from the left-hand table
RIGHT OUTER JOIN - keep data from the right-hand table
FULL OUTER JOIN - keep data from both tables
The result of the left outer join (represented by )is the set of all
combinations of tuples in R and S that are equal on their common attribute
names, in addition to tuples in R that have no matching tuples in S. In the
resulting relation, tuples in S which have no common values in common
attribute names with tuples in R take a null value.For an example consider
the tables Employee and Dept and their left outer join: NULL value is
represented by
Fig. 4.10
Fig. 4.11
UNIT STRUCTURE
A data type is an attribute that specifies the type of data that the
column in a table can hold. The following table shows the commonly used
data type in SQL:
Data Type Description
CHAR(n) Stores character strings of fixed length. The
fixed size is specified in the parenthesis. It can
hold upto 255 characters. The string can contain
letters, numbers, and special characters.
VARCHAR2(n) Stores character strings of variable length. The
maximum size is specified in parenthesis. Can
also store up to 255 characters. It is a more
flexible form of the CHAR data type due to its
variable length characteristics.
DATE Stores date data. The standard format is DD-
MON-YYYY. For example, 10-OCT-2008.
DateTime stores date in the 24-hour format.
INT Stores integers, that is numbers without a
decimal part.
Fig. 5.1
The SUM() function returns the total sum of the numeric values of a
column. The syntax is as follows:
SELECT SUM <column_name> FROM <tablename>;
UNIT STRUCTURE
SQL AND joins together two or more conditions and returns result
only when all of the conditions are true. AND helps to query for very specific
Fundamentals of Database Management System 114
records. There is no limit to the number of AND conditions that can be
applied to a query by utilizing the WHERE clause.
For example, the following query will find out only the rows where
the customer identity is ‘c10’ and the product identity is ‘p5’. It will not find a
row for customer identity ’c12’ and product identity p5.
SELECT *
FROM Sales
WHERE cust_id=’c10’
AND prod_id=’p5’;
The output of the above query will be as follows:
Cust_id Prod_id Quantity
c10 p5 25
This example illustrates how SQL AND combines multiple conditional
statements into a single condition.
6.5 OR OPERATOR
6.7 IN Operator
NULL value means unknown or missing data value. SQL treats any
zero-length string like a NULL value. Sometimes there may be records in a
table containing no value. This may be because during the data entry time
the data was not available or for some rows in a table that particular field is
not applicable. A table column, by default, can have NULL values. The NULL
value is different from other values like a blank or a zero as zero is a numeric
value and a blank space is a character value.
NULL values in a column of a database table can be tested by using
the operators IS NULL and IS NOT NULL. Suppose to display the records
with NULL values in the Phone column of the Employee table, the following
expression can be written in SQL.
SELECT emp_name, phone FROM Employee
WHERE phone IS NULL;
Constraints can be applied to table columns to prevent the addition
of invalid data or deletion of data that is required to maintain the overall
consistency of the database. The constraints are used to control the data
being entered into a database table.
In SQL, NOT NULL constraint can be defined at the column level in
addition with the primary and foreign key. This constraint when defined on
a column means that the column cannot be left empty. It becomes a
mandatory column and a value must be entered into it.
The syntax for it is as:
<column_name> <datatype>(<size>) NOT NULL
For example, to create a student table, the SQL statement is –
CREATE TABLE Student(
Roll_no varchar2(10),
Name varchar2(20),
D_o_b date NOT NULL,
Address varchar2(30));
When inserting values in the columns of the Student table, the date
of birth (d_o_b) field if not entered will display a error message. This field
value has to be entered or each row in the table data.
Fundamentals of Database Management System 124
The NOT NULL constraint can be applied only at the column level of
a table.
UNIT STRUCTURE
7.2 INTRODUCTION
7.3 NORMALIZATION
The first normal form (1NF) sets the very crucial rule to create the
database :
1. There are no repeating or duplicate fields.
2. Each cell contains only a single value.
3. Each record is unique and identified by primary key.
The normalization process involves getting our data to adjust in a
progressive normal forms and you cannot achieve the higher level of
normalization without satisfying the previous levels. The First Normal Form
asking the values in each cell of a table must be atomic. The word atomic
describes that there should be no sets of values in one particular cell.
Violation of any of these conditions would mean that the table is not
strictly relational, and therefore that it is not in 1NF.
Example 1 : Lets explore this principle with an example – a table
within a human resources database that stores the manager-subordinate
relationship. For the purposes of our example, we will impose the business
rule that each manager may have one or more subordinates while each
subordinate may have only one manager.
Intuitively, when creating a table to track this information, we might
create a table with the following fields:
Manager
Subordinate1
Subordinate2
Subordinate3
Subordinate4
However, recall the first rule imposed by 1NF: eliminate duplicative
columns from the same table. Clearly, the Subordinate1-Subordinate4
columns are duplicative. Take a moment and ponder the problems raised
by this scenario. If a manager only has one subordinate – the Subordinate2-
Subordinate4 columns are simply wasted storage space (a precious
database commodity). Furthermore, imagine the case where a manager
Second Normal Form (2NF): A 1NF table is in 2NF if and only if,
given any candidate key K and any attribute A that is not a constituent of a
candidate key, A depends upon the whole of K rather than just a part of it.
2NF attempts to reduce the amount of redundant data in a table by
extracting it, placing it in new tables and creating relationships between
those tables.
Example : Conside a table called Customers with the following
elements:
CustNum
FirstName
LastName
Address
City
State
ZIP
A brief look at this table reveals a small amount of redundant data.
We're storing for state zip and city. Now, that might not seem like too much
added storage in our simple example, but if we had thousands of rows in
our table. Additionally, if the ZIP code for city were to change, we'd need to
make that change in many places throughout the database.
In a 2NF-compliant database structure, this redundant information
is extracted and stored in a separate table. Our new table (let's call it ZIPs)
might have the following fields:
Fundamentals of Database Management System 138
ZIP
City
State
Now that we've removed the duplicative data from the Customers
table, we've satisfied the first rule of second normal form. We still need to
use a foreign key to tie the two tables together. We'll use the ZIP code (the
primary key from the ZIPs table) to create that relationship. Here's our new
Customers table:
CustNum
FirstName
LastName
Address
ZIP
We've now minimized the amount of redundant information stored
within the database and our structure is in second normal form!
Example 2 :
Un normalized Student table
Student# AdvID AdvName AdvRoom Class1 Class2
123 123A James 555 102-8 104-9
124 123B Smith 467 209-0 102-8
Student Table in 1NF
Student# AdvID AdvName AdvRoom Class#
123 123A James 555 102-8
123 123A James 555 104-9
124 123B Smith 467 209-0
124 123B Smith 467 102-8
Student Table in 2NF
Student# AdvID AdvName AdvRoom
123 123A James 555
124 123B Smith 467
Student# Class#
123 102-8
123 104-9
Fundamentals of Database Management System 139
124 209-0
124 102-8
Third Normal Form (3NF): There are two basic requirements for a
database to be in third normal form:
Already meet the requirements of both 1NF and 2NF
Remove columns that are not fully dependent upon the
primary key.
Example : An example of a 2NF table that fails to meet the
requirements of 3NF is:
Tournament Winners
Tournament Year Winner Winner Date of Birth
Indian Invitational 1998 Al Fredrickson 21 July 1975
US Open 1999 Bob Albertson 28 Sep 1968
Grand Masters 1999 Al Fredrickson 21 July 1975
Indiana Invitational 1999 Chip Masterson 14 Mar 1977
Because each row in the table needs to tell us who won a particular
Tournament in a particular Year, the composite key {Tournament, Year} is a
minimal set of attributes guaranteed to uniquely identify a row. That is,
{Tournament, Year} is a candidate key for the table.
The breach of 3NF occurs because the non-prime attribute Winner
Date of Birth is transitively dependent on the candidate key {Tournament,
Year} via the non-prime attribute Winner. The fact that Winner Date of Birth
is functionally dependent on Winner makes the table vulnerable to logical
inconsistencies, as there is nothing to stop the same person from being
shown with different dates of birth on different records.
In order to express the same facts without violating 3NF, it is
necessary to split the table into two:
Tournament Winners
Tournament Year Winner
Indian Invitational 1998 Al Fredrickson