Dbms Notes
Dbms Notes
Dbms Notes
Top of Form
Hierarchical Model:
Data is organized in a tree-like structure with parent-child
relationships.
Each child can have only one parent.
Commonly used in older systems, such as IMS (Information
Management System) databases.
Network Model:
Data is organized as a collection of records and relationships.
Supports many-to-many relationships between entities.
Records are connected through pointers.
Used in systems like CODASYL databases.
Relational Model:
Data is organized into tables (relations) consisting of rows (tuples)
and columns (attributes).
Relationships between tables are established using foreign keys.
One of the most widely used data models, implemented in relational
database management systems (RDBMS) such as MySQL,
PostgreSQL, Oracle, and SQL Server.
Object-oriented Model:
Data is represented as objects containing attributes and methods.
Supports inheritance, encapsulation, and polymorphism.
Popular in object-oriented databases (OODBMS) and object-
relational mapping (ORM) frameworks.
Entity-Relationship (ER) Model:
Represents data as entities, attributes, and relationships.
Entities are objects or concepts with attributes that describe them.
Relationships depict associations between entities.
Often used for conceptual modeling before implementing a database
in a relational or other model.
Document Model:
Data is stored as documents, typically in JSON or XML format.
Documents can contain nested structures and arrays.
Widely used in NoSQL databases like MongoDB, Couchbase, and
Elasticsearch for flexible and schema-less data storage.
Graph Model:
Represents data as nodes (vertices) and edges (relationships) between
them.
Used to model complex relationships and networks, such as social
networks, recommendation systems, and network infrastructure.
Graph databases like Neo4j and Amazon Neptune implement this
model for efficient traversal and analysis of graph data.
These data models provide different ways of structuring and
organizing data, each with its own strengths and weaknesses
depending on the requirements of the application and the nature of the
data being stored.
Entity Set
1. Strong Entity
A Strong Entity is a type of entity that has a key Attribute. Strong
Entity does not depend on other Entity in the Schema. It has a primary
key, that helps in identifying it uniquely, and it is represented by a
rectangle. These are called Strong Entity Types.
2. Weak Entity
An Entity type has a key attribute that uniquely identifies each entity
in the entity set. But some entity type exists for which key attributes
can’t be defined. These are called Weak Entity types.
For Example, A company may store the information of dependents
(Parents, Children, Spouse) of an Employee. But the dependents can’t
exist without the employee. So Dependent will be a Weak Entity
Type and Employee will be Identifying Entity type for Dependent,
which means it is Strong Entity Type.
A weak entity type is represented by a Double Rectangle. The
participation of weak entity types is always total. The relationship
between the weak entity type and its identifying strong entity type is
called identifying relationship and it is represented by a double
diamond.
Key Attribute
2. Composite Attribute
An attribute composed of many other attributes is called a composite
attribute. For example, the Address attribute of the student Entity type
consists of Street, City, State, and Country. In ER diagram, the
composite attribute is represented by an oval comprising of ovals.
Composite Attribute
3. Multivalued Attribute
An attribute consisting of more than one value for a given entity. For
example, Phone_No (can be more than one for a given student). In ER
diagram, a multivalued attribute is represented by a double oval.
Multivalued Attribute
4. Derived Attribute
An attribute that can be derived from other attributes of the entity type
is known as a derived attribute. e.g.; Age (can be derived from DOB).
In ER diagram, the derived attribute is represented by a dashed oval.
Derived Attribute
The Complete Entity Type Student with its Attributes can be
represented as:
Entity and Attributes
Relationship Type and Relationship Set
A Relationship Type represents the association between entity types.
For example, ‘Enrolled in’ is a relationship type that exists between
entity type Student and Course. In ER diagram, the relationship type
is represented by a diamond and connecting the entities with lines.
Entity-Relationship Set
A set of relationships of the same type is known as a relationship set.
The following relationship set depicts S1 as enrolled in C2, S2 as
enrolled in C1, and S3 as registered in C3.
Relationship Set
Degree of a Relationship Set
The number of different entity sets participating in a relationship set is
called the degree of a relationship set.
1. Unary Relationship: When there is only ONE entity set
participating in a relation, the relationship is called a unary
relationship. For example, one person is married to only one person.
Unary Relationship
2. Binary Relationship: When there are TWO entities set participating
in a relationship, the relationship is called a binary relationship. For
example, a Student is enrolled in a Course.
Binary Relationship
3. Ternary Relationship: When there are n entities set participating in
a relation, the relationship is called an n-ary relationship.
Cardinality
The number of times an entity of an entity set participates in a
relationship set is known as cardinality. Cardinality can be of different
types:
1. One-to-One: When each entity in each entity set can take part only
once in the relationship, the cardinality is one-to-one. Let us assume
that a male can marry one female and a female can marry one male.
So the relationship will be one-to-one.
the total number of tables that can be used in this is 2.
Indexing in DBMS
Indexing is used to optimize the performance of a database by
minimizing the number of disk accesses required when a query is
processed.
The index is a type of data structure. It is used to locate and access
the data in a database table quickly.
Index structure:
Indexes can be created using some database columns.
The first column of the database is the search key that contains a
copy of the primary key or candidate key of the table. The values of
the primary key are stored in sorted order so that the corresponding
data can be accessed easily.
The second column of the database is the data reference. It contains a
set of pointers holding the address of the disk block where the value
of the particular key can be found.
Indexing Methods
Ordered indices
The indices are usually sorted to make searching faster. The indices
which are sorted are known as ordered indices.
Example: Suppose we have an employee table with thousands of
record and each of which is 10 bytes long. If their IDs start with 1, 2,
3....and so on and we have to search student with ID-543.
In the case of a database with no index, we have to search the disk
block from starting till it reaches 543. The DBMS will read the
record after reading 543*10=5430 bytes.
In the case of an index, we will search using indexes and the DBMS
will read the record after reading 542*2= 1084 bytes which are very
less compared to the previous case.
Primary Index
If the index is created on the basis of the primary key of the table,
then it is known as primary indexing. These primary keys are unique
to each record and contain 1:1 relation between the records.
As primary keys are stored in sorted order, the performance of the
searching operation is quite efficient.
The primary index can be classified into two types: Dense index and
Sparse index.
Dense index
The dense index contains an index record for every search key value
in the data file. It makes searching faster.
In this, the number of records in the index table is same as the
number of records in the main table.
It needs more space to store index record itself. The index records
have the search key and a pointer to the actual record on the disk.
Sparse index
In the data file, index record appears only for a few items. Each item
points to a block.
In this, instead of pointing to each record in the main table, the index
points to the records in the main table in a gap.
Clustering Index
A clustered index can be defined as an ordered data file. Sometimes
the index is created on non-primary key columns which may not be
unique for each record.
In this case, to identify the record faster, we will group two or more
columns to get the unique value and create index out of them. This
method is called a clustering index.
The records which have similar characteristics are grouped, and
indexes are created for these group.
Example: suppose a company contains several employees in each
department. Suppose we use a clustering index, where all employees
which belong to the same Dept_ID are considered within a single
cluster, and index pointers point to the cluster as a whole. Here
Dept_Id is a non-unique key.
The previous schema is little confusing because one disk block is
shared by records which belong to the different cluster. If we use
separate disk block for separate clusters, then it is called better
technique.
Secondary Index
In the sparse indexing, as the size of the table grows, the size of
mapping also grows. These mappings are usually kept in the primary
memory so that address fetch should be faster. Then the secondary
memory searches the actual data based on the address got from
mapping. If the mapping size grows then fetching the address itself
becomes slower. In this case, the sparse index will not be efficient.
To overcome this problem, secondary indexing is introduced.
In secondary indexing, to reduce the size of mapping, another level
of indexing is introduced. In this method, the huge range for the
columns is selected initially so that the mapping size of the first level
becomes small. Then each range is further divided into smaller
ranges. The mapping of the first level is stored in the primary
memory, so that address fetch is faster. The mapping of the second
level and actual data are stored in the secondary memory (hard disk).
For example:
If you want to find the record of roll 111 in the diagram, then it will
search the highest entry which is smaller than or equal to 111 in the
first level index. It will get 100 at this level.
Then in the second index level, again it does max (111) <= 111 and
gets 110. Now using the address 110, it goes to the data block and
starts searching each record till it gets 111.
This is how a search is performed in this method. Inserting, updating
or deleting is also done in the same manner.
Dynamic multilevel indexes are a data structure used in database
systems to efficiently organize and search large volumes of data. B+
trees and B- trees are two types of tree structures commonly used to
implement indexes in databases.
A B+ tree is a balanced tree data structure where each node contains a
sorted list of keys and pointers to child nodes. In a B+ tree, data
entries are stored only in leaf nodes, and non-leaf nodes are used for
navigation purposes. B+ trees are often used for indexing in databases
because they provide efficient range queries and support ordered
traversal of data.
A B- tree is similar to a B+ tree, but it allows more keys to be stored
in each node, which reduces the height of the tree and improves
performance. In a B- tree, data entries can be stored in both leaf and
non-leaf nodes, unlike B+ trees where only leaf nodes contain data
entries.
To implement dynamic multilevel indexes using B+ and B- trees, you
can use a combination of these tree structures at different levels. For
example, you can use a B+ tree as the top-level index to quickly
locate the appropriate B- tree for a given range of keys. Each B- tree
can then contain pointers to the actual data entries or further levels of
B+ or B- trees, depending on the size and distribution of the data.
This multilevel approach allows for efficient indexing and searching
of large datasets by reducing the height of the tree and minimizing the
number of disk accesses required to locate data entries. Additionally,
the use of both B+ and B- trees provides flexibility in handling
different types of queries and data distributions.
Top of Form
UNIT 2
Types of models(Already in UNIT 1)
Constraints on Relational Database Model
01 Bikash 6000000009
02 Paul 9000090009
01 Tuhin 9234567892
Explanation: In the above table, EID is the primary key, and the first
and the last tuple have the same value in EID ie 01, so it is violating
the key constraint.
3. Entity Integrity Constraints
Entity Integrity constraints say that no primary key can take a NULL
value, since using the primary key we identify each tuple uniquely in
a relation.
Example:
01 Bikash 9000900099
02 Paul 600000009
Explanation: In the above relation, EID is made the primary key, and
the primary key can’t take NULL values but in the third tuple, the
primary key is null, so it is violating Entity Integrity constraints.
4. Referential Integrity Constraints
The Referential integrity constraint is specified between two relations
or tables and used to maintain the consistency among the tuples in
two relations.
This constraint is enforced through a foreign key, when an attribute in
the foreign key of relation R1 has the same domain(s) as the primary
key of relation R2, then the foreign key of R1 is said to reference or
refer to the primary key of relation R2.
The values of the foreign key in a tuple of relation R1 can either take
the values of the primary key for some tuple in relation R2, or can
take NULL values, but can’t be empty.
Example:
01 Divine 12
02 Dino 22
04 Vivian 14
DNO Place
12 Jaipur
13 Mumbai
14 Delhi
A B C
1 2 4
2 2 3
3 2 3
4 3 4
For the above relation, σ(c>3)R will select the tuples which have c
more than 3.
A B C
1 2 4
4 3 4
Note: The selection operator only selects the required tuples but does
not display them. For display, the data projection operator is used.
2. Projection(π): It is used to project required column data from a
relation.
Example: Consider Table 1. Suppose we want columns B and C from
Relation R.
π(B,C)R will show following columns.
B C
2 4
2 3
3 4
FRENCH
Student_Name Roll_Number
Ram 01
Mohan 02
Vivek 13
Geeta 17
GERMAN
Student_Name Roll_Number
Vivek 13
Geeta 17
Shyam 21
Rohan 25
Student_Name
Ram
Student_Name
Mohan
Vivek
Geeta
Shyam
Rohan
Note: The only constraint in the union of two relations is that both
relations must have the same set of Attributes.
4. Set Difference(-): Set Difference in relational algebra is the same
set difference operation as in set theory.
Example: From the above table of FRENCH and GERMAN, Set
Difference is used as follows
π(Student_Name)FRENCH - π(Student_Name)GERMAN
Student_Name
Ram
Mohan
Note: The only constraint in the Set Difference between two relations
is that both relations must have the same set of Attributes.
5. Set Intersection(∩): Set Intersection in relational algebra is the
same set intersection operation in set theory.
Example: From the above table of FRENCH and GERMAN, the Set
Intersection is used as follows
π(Student_Name)FRENCH ∩ π(Student_Name)GERMAN
Student_Name
Note: The only constraint in the Set Difference
between two relations is that both relations must
Vivek have the same set of Attributes.
6. Rename(ρ): Rename is a unary operation used for
Geeta renaming attributes of a relation.
ρ(a/b)R will rename the attribute 'b' of the relation
by 'a'.
7. Cross Product(X): Cross-product between two relations. Let’s say
A and B, so the cross product between A X B will result in all the
attributes of A followed by each attribute of B. Each record of A will
pair with every record of B.
Example:
Ram 14 M
Sona 15 F
Kim 20 M
B
ID Course
1 DS
2 DBMS
AXB
Ram 14 M 1 DS
Ram 14 M 2 DBMS
Sona 15 F 1 DS
Sona 15 F 2 DBMS
Kim 20 M 1 DS
Kim 20 M 2 DBMS
Note: If A has ‘n’ tuples and B has ‘m’ tuples then A X B will have ‘
n*m ‘ tuples.
Derived Operators
These are some of the derived operators, which are derived from the
fundamental operators.
Natural Join(⋈)
Conditional Join
1. Natural Join(⋈): Natural join is a binary operator. Natural join
between two or more relations will result in a set of all combinations
of tuples where they have an equal common attribute.
Example:
EMP
Name ID Dept_Name
A 120 IT
B 125 HR
C 110 Sales
D 111 IT
DEPT
Dept_Name Manager
Sales Y
Production Z
IT A
A 120 IT A
C 110 Sales Y
D 111 IT A
ID Sex Marks
1 F 45
2 F 55
3 F 60
S
ID Sex Marks
10 M 20
11 M 22
12 M 59
1 F 45 10 M 20
1 F 45 11 M 22
2 F 55 10 M 20
2 F 55 11 M 22
3 F 60 10 M 20
3 F 60 11 M 22
3 F 60 12 M 59
-- Orders table
CREATE TABLE Orders (
OrderID INT PRIMARY KEY,
CustomerID INT,
OrderDate DATE,
TotalAmount DECIMAL(10, 2),
FOREIGN KEY (CustomerID) REFERENCES
Customers(CustomerID)
);
Now, let’s perform some operations to demonstrate referential
integrity:
-- Valid data insertion
INSERT INTO Customers (CustomerID, FirstName, LastName,
Email)
VALUES (1, 'John', 'Doe', 'john.doe@example.com');