[go: up one dir, main page]

0% found this document useful (0 votes)
4 views119 pages

DBMS Final Sem Ppt (3)

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1/ 119

DBMS FINAL SEM PPT

S.NO UNITS QUESTION PAGE

1  DBMS ARCHETECTURE 3-8

2  ER MODEL 9 – 24

3 UNIT – 1 CONSTRIANTS 25 – 29

4  ENHANCED ER MODEL 30 – 36

5 UNIT – 2 NORMALIZATION 37 – 49

6  KEYS 50 – 65

7 UNIT -3 BASIC STRUCTURE OF SQL QUERY 66 - 78

8 UNIT – 4 HASHING 79 – 93

9  DISTRIBUTED DATABASES 94 - 100

9 UNIT – 5 TRANSACTIONS 101 – 105

10  DEADLOCK HANDLING 106 – 112

11 SHORT 113 - 119


DATABASE ARCHITECTURE OR
SYSTEM STRUCTURE
 The typical Architecture of DBMS is based on relational data
model.
 It shows the application interfaces used by the different
types of users.
 The four main important components of database
architecture are:
1. Data-base users.
2. Query processor.
3. Storage manager.
4. Disk storage.
i. Data-base users
 Naive users:
 They are also known as unsophisticated users.
 They interact with the system by invoking/ use the application program
interfaces(Web/Mobile/UI interfaces) that have been written previously.
Ex: Clerk , Teller etc.
 Application Programmers:
 Computer Professionals who write application programs.
 Sophisticated Users:
 They interact with the system in the form of either using a database
query language or by using tools such as data analysis software.
 Database administrators:
 They have central control on both the data and the programs that access
those data.
 They use administration tools to interact with the system.
ii. Query processor
 The interactive query processor helps the database system to
simplify and provide access to data.
 DDL Interpreter: A translator which interprets the DDL statements in
data dictionaries.
 DML Compiler: Translates DML statements query language into a
step-by-step instructions that the database system can easily follow to
perform the desired actions.
 Query evaluation engine: It executes the low-level instructions
generated by the DML compiler.
 When a user generates a query, the query is presented to a query
optimizer, which uses information about how to solve it and make an
efficient execution plan for evaluating the query.
 An execution plan is a blueprint for evaluating a query. It is evaluated
by query evaluation engine.
iii. Storage manager
Storage manager is the component of database system that provides
interface between the low-level data storage and the application programs
and queries submitted to the system in the database.
The storage manager is responsible for storing, retrieving, and updating
data in the database.
The storage manager components are:
 Buffer manager: Manages the fetching of data from disk storage into main
memory.
The buffer manager also decides what data to cache in main memory.
 File manager: Manages allocation of space on disk storage and
representation of the information on disk.
 Authorization and Integrity manager: Validates the users who want to
access and test for integrity constraints.
 Transaction manager: Handles system consistency, system failure or
hardware failure. Handles conflict of concurrent access.
iv. Disk storage
The storage manager implements several data structures as
a part of the physical system implementation they are:
 Data files: Used for storing database itself.
 Data dictionary: Used for storing metadata, particularly
schema of database.
 Indices: Indices are used to provide fast access to data items
present in the database.
 Statistical Data: storing statistics of data. It helps to make
decisions.
ER MODEL
The Entity-relationship model or ER model is a way of
graphically representing the logical relationships of
entities(or objects) in order to create a database.
ER model is mostly used data models while designing
the database.
The basic components of data model is:

1. Entities
2. Attributes
3. Relationship
i. Entity:
An entity is a “thing” or “object” in real world that is
distinguishable from all other objects.
Example: If we want to create a database for college, in
that case ‘student’, ‘teachers’, ‘courses’, ‘building’ etc
are the entities.
 Representation: A very simple rectangular box.
 Entity set:
An entity set is a set of entities of the same type that
share similar properties, or attributes.
 Representation: A vertical oval(ellipse) listing all
entities.

Student
Courses
Teachers
Building
 Weak Entity:
The weak entity is the one that depends on other
entities in instance.
A weak entity cannot be identified uniquely.
Representation: Double rectangle.
 Strong Entity:
An entity is the one that is independent of other entity
is called strong entity.
A strong entity can be uniquely identified using its
primary key attributes.
Representation:
ii Attributes:
 The characteristics or properties which describe an entity are called as its attributes.
 An entity is represented by a set of attributes.
 Attributes are descriptive properties possessed by each member of an entity set.
 Representation: A Horizontal Oval ( ellipse).

Attribute
name

 Types of Attributes:

1. Key attributes

2. Composite attribute

3. Multivalued attribute

4. Derived attribute
 Key attributes:
• A key attribute is the one which will uniquely identify
an entity.
• Ex: For a student, studentID is the key attribute.
• Representation: Underlined attribute name.
 Composite attribute:
A composite attribute can be division of subparts.
Which composes of multiple units of attributes forming
a larger one.
Ex: Address ( hno,street,city,state,country etc)
Representation: Horizontal ovals mapped with further
composite vertical ovals.
 Multivalued attribute:
An attribute that can hold multiple values is known as
multivalued attribute.
Ex: A student can have more than one phone numbers
so the phone no attribute is multivalued.
Representation: Double ellipses.

Attribute_Name
 Derived attribute:
Derived attribute are those which are derived based on
other attributes.
Ex: The age of the student can be calculated from
date_of_birth present as an attribute.
Representation: Horizontal dashed ellipse.

attribute_name
iii) Relationship:
The association between different entities that are
existing in a database is depicted by relationship.
 Representation: Diamond box with entities
connected to its edges.
 Relationship Set: A set of relationships of the
same type.
Types of relationships

1. Unary relationship
2. Binary relationship
3. Ternary relationship
4. N-ary relationship
 Unary relationship:
In this Unary relationship, both the associating entity types are
same and degree of relationship is1.
In other words, when the association is within a single entity.
Example: In a class, we have many students, there are monitors
too. So, here class monitors are also students. Thus, we can say
that only students are participating here. The degree of such type
of relationship is 1.
 Binary relationship:
A relationship that associates two different entities is known as
Binary relationship. The degree of relationship is 2.
This is the most used relationship, and one can easily be
converted into relational table.
Example: We have two entity types ‘Student’ and ‘Course’ where
each ‘Student’ enrolled in ‘Course’.
 Ternary relationship:

A relationship that associates three entities is known as Ternary


relationship.
The degree of relationship is 3.
Example: We have three entities types ‘Teacher’, ‘Course’ and
‘Subject’.
The relationship between these entities is defined as the Teacher
teaches a particular course and also teaches a particular class.
 N-ary relationship:
N-ary relationship is when there is complex relationship
among various entities.
It is hard to convert into an entity, relational table as
there are n types of entities.
Example: We have 5 entities Teacher, Class, Location,
Salary, Course. So, here five entity types are associated.
The degree is 5.
CONSTRAINTS
Constraints in DBMS (Database Management Systems)
encompass rules or conditions applied to database data which
ensures data integrity and consistency
Mapping Cardinality

Mapping cardinalities, or cardinality ratios, express the number of


entities to which another entity can be associated via relationship
set.
For binary relationship set R between entity sets A and B, the
mapping cardinality must be one of the following:
For binary relationship set R between entity sets A and B, the mapping
cardinality must be one of the following:

i. One-to-One :
An entity in A is associated with almost one entity in B, and an entity
in B is associated with almost one entity in A.
It is also represented by a 1:1 symbol.

The above example depicts that a 'Student' can have at most


one 'ID' and each 'ID' can correspond to at most one 'Student'.
ii. One-to-Many:
• An entity in A is associated with one or more entities
in B.
• An entity in B can only be associated with almost one
entity in A.
• It can be represented by 1: M.
iii. Many-to-One:
An entity in A is associated with almost one entity in B.
An entity in B, can be associated with one or more entities
in A.
It can be represented as M:1

• The above example depicts that each 'Student' can join at


most one 'College,' and each 'College' can have many
iv. Many-to-Many:
An entity in A is associated with one or more of entities in B
An entity in B can also associate one or more of entities in A.
It can be represented as M:M.

The above example depict that ‘Employees’ can work in many ‘Projects’
and a ‘Project’ can have many employees
ENHANCED ENTITY RELATIONSHIP MODEL ( EER
MODEL)

Using the ER model for bigger data creates a lot of


complexity while designing a database model.
So in order to minimize the complexity, some new
concepts were added which are
1. Generalization
2. Specialization
3. Aggregation
i. Generalization:
Generalization is a bottom-up design process in which
two or more entities of lower level are combined to form
a higher-level entity if they have some common
attributes or properties.
It combines sub classes to make a super class.
The symbol used for specialization/Generalization is
 Consider the two entities ‘Faculty’ and ‘Student’. These two
entities will have some characteristics of their own (for e.g.
Faculty: Address, Name, salary)
 While Student: (Address, Name, Fees) characteristics.
 Now in this example ‘address’ and ‘name’ of both ‘Faculty’ and
‘Student’ can be combined as a ‘Person’ to form one higher level
entity and this process is called the Generalization process.
 In Generalization the size of the schema is reduced. There is no
inheritance in Generalization.
ii. Specialization:
• Specialization is a top-down approach where one
higher level entity can be broken into two or more
lower-level entities.
• So that subset of entities that share some
distinguishing characteristics can be identified.
• The process of designating subgroupings within an
entity set is called specialization.
 Consider an entity ‘Account’. This will have some attributes consider them
‘acc_no’ and ‘balance’.
 ‘Account’ entity may be divided into :
 Current_acc: acc_no, balance, transaction.
 Savings_acc: acc_no, balance, interest rate.
 Hence, we can say that specialized entities inherit characteristics of high level
entity.
 In Specialization, size of schema gets increased.
iii. Aggregation:
Aggregation is a type of association that represents a
“whole part” of relationship between two entities.
It is used to depict scenarios where one entity (the
whole) is composed of multiple related entities( the
parts).
Definition: Aggregation is a relationship between two
entities where one entity represents a higher-level
concept ( whole) that is composed of one or more other
entities.
example:

If a student visits a 'University,' they may inquire not only about
the university but also about the 'Courses' it offers.
The 'University' entity could aggregate 'Courses.'
Normalization
Normalization divides the larger tables into smaller tables and links
them using relationships.
Normalization is used to minimize the redundancy from a set of
relations.
It is used to eliminate undesirable characteristics like Insertion,
Updation and Deletion anomalies(abnormality).
Why do we need normalization?
The main reason for normalizing the relations is to remove these
anomalies.
Failure to eliminate anomalies leads to data redundancy and can
cause data integrity and other problems as the database grows.
Data modification anomalies can be categorized into three
types:
 Insertion anomaly: Insertion anomaly refers to when one
cannot insert a new tuple into a relationship due to lack of
data.
 Deletion anomaly: The deletion anomaly refers to the
situation where the deletion of data results in the
unintended loss of some other important data.
 Updation anomaly: The updation anomaly is when an update
of a single data value requires multiple rows of data to be
updated.
Types of Normal Forms
Normalization works through a series of stages called
Normal forms.
The normal forms apply to individual relations.
The relation is said to be in particular normal form if it
satisfies constraints.
1. 1NF
2. 2NF
3. 3NF
4. BCNF - BOYCE-CODD Normal form
i. First Normal Form (1NF)
A relation will be in 1NF if it contains an atomic value.
An attribute of a table cannot hold multiple values it must hold only
single valued attribute.
Each record needs to be unique.
example: employee table
Relation Employee is not in 1NF because of multi-value attribute
called “Emp_phone”.
We will divide this table in order to remove the multivalued attribute.
ii. Second Normal Form ( 2NF)
In the 2NF, relational must be in 1NF.
There should be no partial dependency.
No partial dependency: No non prime attributes is dependent on any
proper subset of any candidate key of the table.
Non-prime attribute: The attribute which is not part of a candidate
key.
Note: AB->B is a proper subset. AB->AB is a subset.
example: Score table
The above table is in 1NF as it is free from multivalued attributes.
The primary keys from the above table is { stu_id,sub_id} which is
composite primary key.
Candidate key={Stu_id,Sub_id}
Here, non-prime attributes are: {Marks, Teacher}.
Here, the ‘Teacher’ attribute entirely depends on ‘Sub_id’ Sub_id ->
Teacher
Teacher is independent of ‘Stu_id’.
Here, non prime attribute ‘Teacher’ is a proper subset of candidate
key which violates 2NF.
Procedure to convert a table into 2NF:
Identify the partial and full dependencies and apply decomposition
rule.
In the above table column ‘Teacher’ is dependent on
‘Sub_id’.
The simple solution is to create a separate table for ‘sub_id’
and ‘teacher’ and removing it from score table.
Resulting tables are :
iii. Third Normal Form(3NF)
A relation will be in 3NF if it follows 2NF .
There should be no transitive dependency for non prime attributes.
 No Transitive dependency: No Non prime attributes depend on non prime
attribute rather than the prime attribute or primary key or A relation is in third
normal form if it holds at least one of the following conditions for every non -
trivial function dependency X->Y.
1. X is a super key.
2. Y is a prime attribute i.e. (each element of Y is part of some candidate key).
In the above employee table, we have dependency as
shown
Project_id ->Date_completion
Here, ‘Date_completion’ is dependent on ‘Project_id’ which
is non prime attribute.
Primary key={Emp_id}
Super key={Emp_id,{emp_id,name},…}
Candidate key={emp_id}
Non_prime attributes={Name,Project_id,date_completion}
It violates 3NF.
 Solution:
Transitive dependency is converted into a separate table.
Add primary key of each table resulted from each transitive
dependency as the foreign key in an original table along with all
remaining attributes.
iv. Boyce-Codd Normal Form(BCNF)
A table is said to be in BCNF if it satisfies the following
properties.
1. It should be in 3NF.
2. For every functional dependency A->B, ‘A’ must be a super
key i.e. in any functional dependency L.H.S attribute must
be a super key.
• From the above table super key={Stu_id,{Stu_id,Course_id},….}
• Here, the relation is not in BCNF because ‘Instructor’ determines
‘Course_ID’, but ‘Instructor’ is not a superkey.
 Decomposition into BCNF:
To convert the table into BCNF:
1. Identify the violation:
Instructor→CourseID violates BCNF.
2. Decompose the relation:
Create one table for the dependency Instructor→CourseID
 Instructor_Course (Instructor, Course_ID)
Create another table for the remaining attributes:
 Enrollment (Student_ID, Course_ID, Instructor)
KEYS
Keys are one of the basic requirements of a relational
database model.
It is widely used to identify the tuples(rows) uniquely in the
table.
We also use keys to set up relationships among various
columns and tables of a relational database.
Types of Keys
i. Primary Key
ii. Foreign Key
iii. Candidate key
iv. Super key
v. Composite key
vi. Surrogate key
i. Candidate key

The minimal set of attributes that uniquely identify a tuples


is known as candidate key.
Attributes

Tuple
Stu_ID NAME PHONE NUMBER AGE

1 AYRA 9966225544 23

2 BRAN 8854628742 25

3 JOHN 6395874501 26

We can uniquely identify the tuple in the above table with the
attribute Stu_Id.
We can uniquely identify using Name and Phone attributes but, our
definition says minimal so we are going to choose Stu_Id.
STU_ID COURSE_ID NAME

1 ECE101 BASIC ELECTRONICS

2 CSC101 C++

3 CSC101 C++

1 EEE101 BASIC ELECRONICS

2 CSE201 DBMS

This is course_enrolment table.


In this table we can uniquely identify a tuple using
(Stu_ID,Course_ID) attributes instead of only Stu_ID attribute
because it is repeated.
Candidate key : (Stu_ID,Course_ID)
Candidate key can be composite in nature.
 Features of Candidate key
Candidate key must have non null elements in each record /
tuple.
Only unique values are allowed.
Candidate key may have multiple attributes.
It should contain minimum fields to ensure uniqueness.
Example: username or user_ID/ Student_ID or phone
number / Passport number or Aadhaar number or License
number
ii. Super key

A Super Key is a set of one or more attributes that uniquely identify a


tuple in a relation.
Note:
Candidate key – Minimal
Super key – doesn’t say any number
STU_ID COURSE_ID NAME

1 ECE101 BASIC ELECTRONICS

2 CSC101 C++

3 CSC101 C++

1 EEE101 BASIC ELECRONICS

2 CSE201 DBMS

As we know (Stu_ID, Course_ID) can uniquely identify a tuple but, if


we add Name attribute then also we can uniquely identify a tuple.
As (Stu_ID,Course_ID) is Candidate key and now if we add + 1 Extra
attribute = Super key
Every Candidate key is Super key but vice versa is not true.
If we add one or more attributes to a candidate key then, it becomes
Super key.
STU_ID COURSE_ID NAME ADRESS

Super key ( S.K) =


{ 1,2}
{1,2,3}
{1,2,4}
{1,2,3,4}
The minimal set of Super Key is the Candidate Key
iii. Primary Key

A unique not null attribute which uniquely identify each tuple in a


relation is known as Primary key.
A table can have only one Primary Key.
For example: The Candidate Keys are {User_ID}, { User_Name}.
As both of them are Candidate keys but, We are going to choose one
of them as Primary key in a such a way that the particular attribute
will remain the same through entire time.
For simplicity we need to decide only one parameter to do the job.
For ex: phone number, stu_ID. Stu_ID is considered as Primary Key.
Note: Primary Key may be a composite.
iv. Foreign Key

To define a relationship with another table of a database.


We use foreign key.
A foreign key is a column of a table that points towards the
primary key of another table.
The relation which is being referenced is called referenced
relation and the attribute which refers to the referenced
relation is called Referencing relation.
• Department table

DEPT_ID DEPT_NAME
001 CSE
002 IT
003 MECHANICAL

Instructor table
DEPT_ID INST_ID NAME
002 B005 WALTER
002 B003 JESSE
001 B007 SKYLER
v. Composite Key
Keys which more than one attribute that can uniquely
identify a tuple in a relation is called composite key.
Two or more attributes are used together to make a
composite key.
It acts as a primary key if there is no primary key in a table.
vi. Surrogate Key
A surrogate key in SQL is an artificial key that is used as a unique
identifier for each record in a table.

FRIST NAME LAST NAME DOB

JOHN DOE 01-01-2000

JEAN SMITH 02-02-2001


This is the surrogate key. It is an integer that automatically increments
with each new record. This ensures that each student has a unique
identifier.
Surrogate key can act as an Primary Key.

STU_ID FRIST NAME LAST NAME DOB

01 JOHN DOE 01-01-2000

02 JAEN SMITH 02-02-2001


BASIC STRUCTURE OF SQL
QUERIES
 Introduction to SOL Queries:
SQL Queries are used to interact with relational databases to
retrieve information and perform operations on the data.
The basic structure of an SQL query consists of three main
clauses:
1. SELECT
2. FROM
3. WHERE
Components of SQL Queries:
i. SELECT Clause: Specifies the attributes (columns) that you want to retrieve
in the query result.
ii. FROM Clause: Specifies the tables (relations) from which you want to
retrieve data.
iii. WHERE Clause: Filters the rows based on specified conditions.
I. Queries on a Single Relation :
General format:
 SELECT attribute1, attribute2, .., attributeN
 FROM relation_name
 WHERE condition;
i. SELECT CIause:
Determines the specific attributes (columns)you want to retrieve from the
database.
 Syntax: SELECT attributel, attribute2…, attributeN.
Example: SELECT name, dept name FROM instructor; (retrieves name and
department name from the instructor table)
If In we want to select all records from the table, we use the special character(*")
SELECT *FROM table-name;
• Ex:
SELECT *FROM instructor;
DISTINCT:
In some cases, there may be values which are repeated in a particular
attribute.
In those cases where we want to force the elimination of duplicates,
we insert the keyword distinct after select.
We can rewrite the preceding query as:
SELECT DISTINCT column-name FROM table-name
SQL allows us to use the keyword all to specify explicitly that duplicates
are not removed: select all dept name from instructor;
2. FROM clause: Specifies the tables (relations)involved in the query.
 Syntax: FROM relation name
 Example: SELECT name FROM instructor Here instructor is the relation's
name from which we are selecting the name attribute to be displayed in
the output.
3. WHERE clause: The where clause allows us to select only those rows in
the result relation of the from clause that satisfy a specified
predicate(condition).
Defines a condition that filters the retrieved data based on specific
criteria.
 Syntax: WHERE condition
 Example: SELECT name FROM instructor WHERE salary > 70000; (filters
instructors with salary over Rs.70,000)
II. Queries on Multiple Relations
Queries often need to access information from multiple relations.
This process involves joining tuples from different relations based on
specified conditions.
Components of SQL Queries: SQL queries typically consist of three
main clauses:
i. SELECT clause: Lists the attributes desired in the query result.
ii. FROM clause: Specifies the relations to be accessed in the
evaluation of the query.
iii. WHERE clause: Contains predicates involving attributes of the
relations listed in the FROM clause to filter the results.
General syntax: SELECT Al, A2, ..., An FROM r1, r2, m WHERE P;
Here, each Ai represents an attribute, each ri represents a relation,
and P represents a predicate.
If the WHERE clause is omitted, the predicate r is considered true.
 Execution of SQL Queries:
Understanding the operations specified by an SQL query is best done
by considering the clauses in operational order:
i. SELECT Clause: Finally, the SELECT clause specifies the attributes to
be included in the output of the query
ii. FROM Clause: This clause defines a Cartesian product of the
relations listed in it.
The Cartesian product combines tuples from all specified relations.
iii. WHERE Clause: Predicates in the WHERE clause filter is the tuples
generated by the Cartesian product, retaining only those that
satisfy the specified conditions.
Select name, instructor dept name, building
From instructor, department
Where instructor, dept name= department . Dept name;
 Execution Process:
Naming Conventions and Prefixes:
To avoid confusion, attributes are prefixed with the name of the
relation from which they originate.
Instructor . dept name For attributes present in only one relation, the
relation's name prefix can be omitted.
Step1)- FROM clause:
FROM instructor, department
The Cartesian product of the "instructor" and department"
relations combines every tuple from the "instructor relation
with every tuple from the ‘’department’’ relation, regardless
of whether they are associated or not.
T means that each tuple in "instructor" is paired with every
tuple in "department, even if they are unrelated.
 Cartesian product of instructor and department table:
• Step2)- WHERE clause:
• WHERE instructor.dept name=department.dept_name
The WHERE clause in SQL queries is crucial for refining the results
obtained from the Cartesian product.
Predicate in the where clause is used to restrict the combinations
created by the Cartesian product to those that are meaningful for the
desired answer.
We aim to only pair tuples from "instructor with those from
"department that are relevant.
For instance, we typically want to match tuples from "department
with those from "instructor that share the same instructor "dept
name".
Step 3)- SELECT clause:
• Select name, instructor.dept name, building
For each tuple in the result of Step2,output the attributes (or results
of expressions)specified in the select clause.
Hashing
Hashing in DBMS is a technique to quickly locate a data
record in a database irrespective to the size of the database.
For larger databases containing thousands and millions of
records, the indexing data structure technique becomes very
inefficient because searching a specific record through
indexing will consume more time.
What is Hashing ?
 There are three main components in Hashing:
i. Hash table:
A hash table is an array or data structure and its size is determined by
the total volume of data records present in the database.
Each memory location in a hash table is called a ‘bucket‘ or hash indices
and stores a data records at exact location and can be accessed through
a hash function.
ii. Bucket:
A storage location where data is stored.
iii. Hash Function:
A mathematical formula that takes the unique data (like
a user ID) and calculates where it should be stored in the
table (the bucket).
 Collision:
Sometimes, two different data entries (like two different IDs)
might end up in the same bucket.
This is called a collision.
To handle collisions, databases use techniques like:
i. Chaining: Storing multiple entries in the same bucket using
a linked list.
ii. Open Addressing:
This Looks for the next empty bucket to store the data.
Instead of chaining, open addressing searches for the next available
bucket.
Types of Hashing

1. Static Hashing
2. Dynamic Hashing
i. Static Hashing
The number of buckets is fixed.
The same hash function is always used.
If you know the size of the database in advance, static
hashing works well.
Problem: If the data size grows, the fixed number of
buckets might not be enough, causing collisions.
Assume that we have a hash table with 4 buckets and
use a hash function that computes the bucket index
based on the key module the number of buckets, i.e.,
hash(key) % 4.
 Initial Setup:
 Number of Buckets = 4
 Hash Function = key % 4

1. Initial Data (Small Dataset): Let's say we insert


keys 3, 7, 12, and 18 into this hash table. Using the
hash function (key % 4):
 3%4=3 → goes to Bucket 3
 7%4=3→ goes to Bucket 3 (collision with key 3)
 12%4=0 → goes to Bucket 0
 18%4=2 → goes to Bucket 2
• Here, Bucket 3 already has a collision because both 3 and 7 map to the
same bucket.
• To handle the collision, techniques like chaining or open addressing are
used.
Advantages:
i. Simplicity: Static hashing is easier to implement and understand due to
its fixed size.
ii. Fast Access: Search and insert operations often take constant time, O(1).
Predictable Performance: Consistent performance for fixed-size data sets.
iii. Concurrency-Friendly: Easier to manage in multi-threaded environments
since no resizing is required.
Disadvantages:
iv. Overflows: If a bucket is full or has too many entries, overflow occurs,
and additional storage mechanisms (like chaining) are needed.
v. Decreased Performance: As collisions increase, the time required to
insert and search for keys also increases, leading to slower performance.
vi. Fixed Size: If the dataset grows beyond what was initially anticipated, the
fixed number of buckets can't handle the load efficiently.
ii. Dynamic Hashing ( Extendible Hashing) :
The number of buckets can grow or shrink as needed.
When there are too many collisions (too many records in one bucket),
new buckets are created to store data.
• Flexible Hash Function: The hash function adjusts based on the
number of records.
 Key Components:
1.Directory: A table that holds pointers to buckets. The directory can
grow or shrink based on the data size.
2.Buckets: Containers where records are stored. Each bucket can hold
multiple records, but if it gets full, it splits into two.
3. Global Depth: Indicates the number of bits used from the hash
value to index into the directory.
4. Local Depth: Represents how many bits are used to differentiate
between records in a particular bucket.
 Initial Setup:
Global Depth starts at 2 (k = 2), meaning we are using
2 bits from the hash value to index into the directory.
There are 4 buckets (00, 01, 10, 11) corresponding to
the 2-bit combinations.
At this point, the hash function uses the first 2 bits of each key's
hash to determine which bucket to put the data into. For
example:
Key 5 might hash to binary 01, so it goes into Bucket 01.
Key 8 might hash to binary 00, so it goes into Bucket 00
 Adding Keys
Let’s add some keys to the hash table:
Key 5 hashes to 01 → goes into Bucket 01.
Key 8 hashes to 00 → goes into Bucket 00.
Key 12 hashes to 00 → also goes into Bucket 00.
Key 7 hashes to 11 → goes into Bucket 11.
 Bucket Overflow

• Now, we add Key 20, which also hashes to 00. Since Bucket
00 already has Key 8 and Key 12, it overflows.
• When this happens:

1. The system checks the Local Depth of Bucket 00. In this


case, its local depth is 2 because it’s using 2 bits to
Distributed Databases

A distributed database is a collection of data that is spread across


multiple physical locations, often across different computers,
networks, or even geographical areas.
Despite the distribution, the system is managed as a single cohesive
unit, allowing users to access and manage the data as though it were
all stored in one place.
Distributed databases can be categorized into:
i. Homogeneous
ii. Heterogeneous
iii. Cloud based
I. Homogeneous Distributed
Database
In a homogeneous distributed database, all the individual databases
across different locations are of the same type.
This means they use the same software, schema, and data structures.
These databases are easier to manage because they follow the same
rules, processes, and architecture.
 Characteristics:
i. Same DBMS: All sites use the same database management system
(DBMS), making them compatible.
ii. Uniform Schema: The data structure, schema, and data format are
consistent across all locations.
iii. Easy Communication: Since all nodes follow the same architecture,
communication between databases is seamless and efficient.
iv. Centralized Management: All databases are managed centrally, and
any change in one location can easily be replicated to others.
v. Simplicity: Because of uniformity, homogeneous databases are
simple to implement and maintain.
• Example: A company might have databases in different offices (New
York, London, Tokyo), all using Oracle DBMS with the same schema
and structure.
II. Heterogeneous Distributed
Databases
In a heterogeneous distributed database, the individual databases
may use different DBMS, schema, or data structures.
These systems must use middleware or translation layers to enable
communication between databases that are not directly compatible.
 Characteristics:
i. Different DBMS: The different sites use different DBMS (e.g., one
site uses Oracle, another uses MySQL, etc.).
ii. Different Schema: The schema or data structure may diAer
between locations.
iii. Middleware Requirement: Since the systems are not inherently
compatible, a middleware layer is required to enable data sharing
and communication between different databases.
iv. Complexity: Heterogeneous systems are more complex to manage
due to differences in data structures, query languages, and DBMS
software.
v. Flexibility: Despite the complexity, these databases provide
flexibility by allowing integration of various systems.
• Example: A multinational corporation might have one database
running on MySQL for internal use and another running on Oracle for
customer data.
• Middleware is needed to allow these databases to communicate.
III.Cloud-based Databases
Cloud-based databases are databases hosted on cloud computing
platforms like Amazon Web Services (AWS), Microsoft Azure, or
Google Cloud Platform (GCP).
The data is stored in data centers managed by cloud providers, and
users can access the data over the internet.
 Characteristics:
i. Scalability: Cloud databases can easily scale up or down based on
demand. You can add or reduce resources as needed, offering
flexibility.
ii. Cost-effective: Organizations pay for only the resources they use
(pay-as-you-go model), which can lower costs.
iv. Security: Cloud providers offer security measures such as
encryption, access control, and firewalls to protect data.
v. Accessibility: Since data is stored in the cloud, it can be accessed
from anywhere, provided there is an internet connection.
 Types of Cloud Databases:
i. Relational Cloud Databases: Examples include Amazon RDS
(Relational Database Service), Google Cloud SQL, etc.
ii. NoSQL Cloud Databases: Examples include Amazon DynamoDB,
Google Cloud Fire store, etc.
iii. Hybrid Cloud Databases: Databases that combine on-premises and
cloud storage (e.g., Microsoft Azure Hybrid Cloud).
 Example: Amazon RDS allows companies to run a fully-managed
relational database in the cloud without worrying about the
underlying infrastructure.
TRANSACTIONS
A transaction is a group of operations that might involve reading or updating
data in a database.
These operations are wrapped between "begin transaction" and "end
transaction" statements.
 For example:
1. Start transaction.
2. Read checking account balance.
3. Deduct $50.
4. Write new balance to checking account.
5. Read savings account balance.
6. Add $50.
7. Write new balance to savings account.
8. End transaction.
In a database, a transaction is a set of operations that are treated as a
single unit.
 TRANSACTION STATES :
A transaction in a DBMS is a sequence of one or more operations
(such as reading, writing, or updating data) that must be executed as
a single logical unit.
A transaction goes through several states during its lifecycle.
These states ensure that the transaction can be managed properly to
either commit changes or rollback in case of errors.
i. Active State
This is the initial state of a transaction when it starts executing the
transaction is considered active while it is performing its operations,
such as reading or modifying data in the database.
At this point, the transaction could still fail due to various reasons,
such as errors in the code, logical errors, or hardware failure.
ii. Partially Committed State
After the transaction completes its final operation, it enters the partially
committed state.
In this state, all operations of the transaction have been executed, but the
changes might still be residing in main memory (RAM) and haven’t yet been
permanently saved to disk.
It is still possible for the transaction to fail at this stage due to system crashes or
other hardware issues.
If the system fails at this point, the changes are not yet guaranteed to be
permanent, and the transaction might need to be rolled back.
iii. Committed State
When all the changes made by the transaction are safely written to disk (i.e.,
stored permanently), the transaction enters the committed state.
At this point, the transaction has successfully completed, and its effects on the
database are permanent.
 Once a transaction is committed, the system guarantees that its changes will
survive even in the event of a system crash or failure.
iv. Failed State
The transaction enters the failed state when the system detects a failure, either
due to a logical error in the transaction's code, invalid input, or a hardware
failure.
In the failed state, the transaction cannot continue executing normally, and its
operations are considered invalid.
Failures can happen at any time before the transaction commits, such as during
execution (active state) or even after the final operation (partially committed
state) but before the commit.
v. Aborted State
Once a transaction has entered the failed state, it must be rolled back, meaning
all changes made by the transaction must be undone.
After the rollback process is completed, the transaction enters the aborted state.
 In this state, the database is restored to the condition it was in before the
transaction began, ensuring that the transaction has no effect.
Depending on the situation, the aborted transaction might be restarted as a new
transaction or might be canceled entirely (e.g., if there’s a logical error).
DEADLOCK HANDLING
Deadlock occurs in a database management system (DBMS) when
multiple transactions are stuck, each waiting for resources held by the
other transactions.
None of the transactions can proceed because they are all waiting for
each other, creating a cycle of dependency.
 We have 5 types of Handling methods they are:
1. Deadlock Prevention
2. Deadlock Detection
3. Deadlock Recovery
4. Deadlock Avoidance
i. Deadlock Prevention
In this method, the system ensures that deadlock never occurs by following
the below conditions:
 Mutual Exclusion: Resources are not shared.
 Hold and Wait: Transactions must not hold one resource and wait for another.
 No Preemption: Once a transaction holds a resource, it cannot be forcibly
taken away.
 Circular Wait: A circular chain of transactions exists, where each transaction
waits for a resource held by the next in the chain.
Example: Consider two transactions, T1 and T2, accessing resources R1 and R2.
 T1 locks R1 and needs R2.
 T2 locks R2 and needs R1.
This creates a circular wait condition, where T1 and T2 are both waiting for the
other to release a resource.
Techniques for Deadlock Prevention:
 Wait-Die Scheme: o Older transactions wait for younger ones to release
resources, while younger transactions get terminated (die) and
restarted. o
Example: If T1 (older) requests R2 held by T2 (younger), T1 will wait. If
T2 (younger) requests R1 held by T1 (older), T2 will be aborted and
restarted.
 Wound-Wait Scheme: Older transactions wound (preempt) younger
ones by forcing them to release their resources and restart.
Example: If T1 (older) requests R2 held by T2 (younger), T2 is aborted
ii. Deadlock Detection
In this approach, the DBMS allows transactions to proceed and checks for
deadlock periodically.
The wait-for graph is a tool used to detect deadlocks. It’s a directed graph
where:
 Nodes represent transactions.
 An edge from T1 to T2 indicates that T1 is waiting for a resource held by T2.
 If a cycle is detected in the wait-for graph, a deadlock has occurred.
 The wait for the graph is maintained by the system for every transaction
which is waiting for some data held by the others.
 The system keeps checking the graph if there is any cycle in the graph.
Example:
 T1 locks R1 and waits for R2 (held by T2).
 T2 locks R2 and waits for R1 (held by T1).
The wait-for graph will have a cycle, indicating a deadlock between T1 and T2.
 Action Taken:
Once a deadlock is detected, the DBMS chooses a victim—a transaction to roll
back.
The choice is based on:
 Cost of rollback: How much work will be lost if the transaction is rolled back.
 Transaction age: Younger transactions may be chosen as victims since they have
done less work. The victim transaction is rolled back to release resources,
breaking the deadlock.
Example: If T2 is chosen as the victim, it will be rolled back, releasing R2 and
allowing T1 to proceed.
iii. Deadlock Recovery
Once a deadlock is detected, the system must recover by resolving it.
The DBMS rolls back one or more transactions to break the cycle.
 Steps for Deadlock Recovery:
1. Transaction rollback: Roll back a transaction to its previous state to
release its locks.
2. Transaction restart: After a rollback, the transaction may be
restarted.
3. Victim selection: The DBMS chooses which transaction to roll back
based on factors like transaction priority, age, or the cost of
restarting.
Example: If the DBMS detects that T1 and T2 are in deadlock, it
selects T2 as the victim, rolls it back, and releases its resources.
T1 can then acquire R2 and proceed.
iv. Deadlock Avoidance
In deadlock avoidance, the DBMS ensures that a transaction proceeds
only if it is safe to do so.
A safe state is one where there is no possibility of deadlock.
One popular algorithm for deadlock avoidance is the Banker’s
Algorithm.
The system checks if allocating a resource to a transaction keeps the
system in a safe state.
If it does not, the transaction must wait.
SHORT
1. DBMS definition ?
DBMS is a collection of related data and various programs that are used to
handle that data.
DBMS is a software that manages and controls access to the database.
2. Write Advantages and Dis-advantages of DBMS ?
 Advantages:
DBMS removes the data redundancy so there is no duplication of data in
database.
Data can be isolated in separate tables for convenient and efficient use.
The DBMS allows concurrent access to multiple users by using the
synchronization technique.
The atomicity of data can be maintained.
 Dis-advantages
1. Complex design: Database design is complex, difficult and time
consuming.
2. Hardware and software cost: Large amount of investment is needed
to setup the required hardware or to repair software failure.
3. Damaged part: If one part of database is corrupted or damaged, then
entire database may get affected.
4. Conversion cost: If the current system is in conventional file system
and if we need to convert it to database systems then large amount of
cost is incurred in purchasing different tools and adopting different
techniques as per the requirement.
3. What is Data Independence ?
Data independence refers characteristic of being able to modify the
schema at one level of the database system without altering the schema
at the next higher level.
4. Define DBA ?
A Person who has the central control over the system
(Data and programs) is called a Database administrator(DBA).
5. Write a Short note on RDBMS ?
A Relational Database Management System (RDBMS) is a software
system used to manage, store, and retrieve data organized in a
relational model.
In this model, data is stored in tables (also known as relations)
consisting of rows (records) and columns (attributes).
RDBMSs use Structured Query Language (SQL) for querying and
manipulating the data.
6.Write a note on Modification of Database ?
Relational Algebra itself is more focused on querying and transforming
the data rather than modifying it.
Database modification is crucial, and in the context of Relational Algebra,
these operations are often conceptualized by extending the language.
The three primary modification operations are Insert, Delete, and
Update.
7.Write a Short note on SQL ?
SQL (Structured Query Language) is a standardized programming
language designed for managing and manipulating relational databases.
It is widely used for querying, updating, and organizing data stored in a
database.
SQL is the backbone of many database management systems, such as
MySQL, PostgreSQL, Oracle, and Microsoft SQL Server.
8.What is Aggregate function and its types in SQL.
• Aggregate functions are functions that take a collection (a set or
multiset) of values as input and return a single value.
• SQL offers five built-in aggregate functions:
i. Average: avg
ii. Minimum: min
iii. Maximum: max
iv. Total: sum
v. Count: count
• The input to sum and avg must be a collection of numbers, but the
other operators can operate on collections of nonnumeric data types,
such as strings, as well.
9. What of Indexing ?
Indexing is a data structure technique used to quickly locate and
access the data in a database table without scanning the entire table.
It improves the speed of data retrieval operations on a database at
the cost of additional storage space for the index structure.
10. Define File organization ?
In a database, file is organized as a sequence of records, which are
mapped onto disk blocks.
Since records vary in size, different techniques are required to
manage their storage effectively.
This is important for ensuring efficient file operations like inserting,
deleting, and retrieving records.
11. What are Properties of transaction ?
ACID Property is the property that is use to ensure the system smoothness and
the data accuracy.
i. Atomicity
ii. Consistency
iii. Isolation
iv. Durability
12. What are Different types of failures in a recovery
system ?
 Types of Failures
i. Transaction Failure
ii. System Crash
iii. Disk Failure

You might also like