DBMS
DBMS
1
Database Management System
CIE Marks: 90
CLO 1 Describe the fundamental concepts of databases, including data models, database
architectures, and database management systems. Students will be able to explain
the benefits and limitations of different types of databases, and identify use cases
where each type would be appropriate.
CLO 2 Understand and apply principles of database design, including entity-relationship
modeling, normalization, and data modeling tools. Students will be able to create
and optimize databases that are efficient, scalable, and maintainable.
CLO 3 Create and execute SQL queries to retrieve and manipulate data in a relational
database. Students will be able to design and implement SQL statements for a
variety of use cases, including querying, updating, and aggregating data.
2
Course Learning Outcome (CLOs): After Completing this
course successfully, the student will be able to…
3
Summary of Course Content
2 Relational Database Design: This topic would cover the process 10 CLO2
of designing a relational database, including entity-relationship CLO4
4
4 Database Administration and Security: This topic would cover 10 CLO 4
the tasks of database administration, including backup and
recovery, database security, and performance tuning. Students
would learn how to manage databases in a secure and efficient
manner.
5 NoSQL Databases: This topic would introduce non-relational 10 CLO 5
databases such as MongoDB, Cassandra, and Redis. Students
would learn how to design and query non-relational databases
and compare and contrast them with traditional relational
databases.
Recommended Books:
• "Database Systems: The Complete Book" by Hector Garcia-Molina, Jeffrey D. Ullman, and Jennifer
Widom. This book covers database concepts, design, and implementation in depth, with a focus on
relational databases. The latest edition is the 2nd edition, published in 2008.
• "Fundamentals of Database Systems" by Ramez Elmasri and Shamkant B. Navathe. This book
provides an introduction to database concepts, design, and implementation, with a focus on
relational databases. The latest edition is the 7th edition, published in 2016.
5
Assessment Pattern
CIE- Continuous Internal Evaluation
(90 Marks)
Bloom's
Category Tests Assignments Quizzes Attendance
Marks (out of (45) (15) (15) (15)
90)
Remember 5 03
Understand 5 04 05
Apply 15 05 05
Analyze 10
Evaluate 5 03 05
Create 5
6
Week Topic Teaching Assessment Alignment
No Learning Strategy(s) to CLO
Strategy(s)
1 -Introduction to Database Systems Lecture, Quiz, Assignment CLO1
Reading
-Understand the history and importance of Assignments
database systems
2 Data Models Lecture, QA, Quiz, CLO1, CLO2
Reading Assignment
Discuss different types of data models and their Assignments
Applications
3 Database Design ,ER-Diagram and Unified Lecture, Case Studies Quiz, CLO2
Modeling Language Assignment
7
6 SQL Basics Lecture, Case Studies Quiz, CLO2
Assignme
Learn basic SQL syntax and commands nt
8
12 Hashing Lecture, Case Studies Quiz, CLO4
Hash Based Indexing, Tree based Indexing Case
Study
Reports
10
What is a Database?
Characteristics of Databases:
• Organized and structured.
• Persistent and stored electronically.
• Accessible by multiple users or
applications.
What is a Database Management System
(DBMS)?
23
Introduction to Data Models
Definition: A data model defines the
structure, organization, and constraints of
data stored in a database. It provides a
framework for how data is stored, accessed,
and manipulated.
Purpose of Data Models:
◦ To represent real-world entities and their
relationships.
◦ To ensure data consistency and integrity.
◦ To facilitate the design and implementation of
databases.
Components of a Data Model:
Structure: Defines how data is
organized (e.g., tables, hierarchies).
Operations: Specifies the methods to
manipulate data (e.g., insert, update).
Constraints: Enforces rules to maintain
data integrity.
Types of Data Models
39
Introduction to Database
Design
Definition: Database design is the
process of defining the structure, storage,
and organization of data in a database to
ensure efficiency, scalability, and data
integrity.
Objectives of Database Design:
◦ To minimize data redundancy.
◦ To ensure data consistency and integrity.
◦ To enhance data retrieval and manipulation
efficiency.
Introduction to Database
Design(cont.)
Phases of Database Design:
Conceptual Design
Logical Design
Physical Design
Phases of Database Design
Conceptual Design
Definition: Focuses on high-level
representation of the data structure using
Entity-Relationship (ER) diagrams.
Key Components:
◦ Entities: Objects or things in the real world
(e.g., Customer, Product).
◦ Attributes: Properties of entities (e.g.,
Customer Name, Product Price).
◦ Relationships: Associations between entities
(e.g., Customer purchases Product).
Phases of Database Design
Tools Used:
◦ ER Diagrams
◦ UML Diagrams
Example: An ER diagram representing
Customers and Orders where a
Customer can place multiple Orders.
Phases of Database Design
Logical Design
Definition: Converts the conceptual
design into a logical model that can be
implemented in a database management
system (DBMS).
Steps Involved:
◦ Transform entities into tables.
◦ Define primary keys for each table.
◦ Define relationships using foreign keys.
◦ Normalize the database.
Phases of Database Design
Normalization:
Process of organizing data to minimize
redundancy and improve integrity.
Normal Forms:
◦ First Normal Form (1NF): Eliminate
repeating groups; ensure atomic values.
◦ Second Normal Form (2NF): Eliminate
partial dependencies; all non-key attributes
depend on the whole primary key.
◦ Third Normal Form (3NF): Eliminate
transitive dependencies; non-key attributes
depend only on the primary key.
Phases of Database Design
Physical Design
Definition: Focuses on the physical storage and
performance optimization of the database.
Key Considerations:
◦ Indexing: Create indexes on frequently queried
columns for faster data retrieval.
◦ Partitioning: Divide large tables into smaller, more
manageable pieces.
◦ Storage Optimization: Ensure efficient use of disk
space.
Example:
◦ Adding an index on the "CustomerID" column to
speed up searches.
Steps in Database Design
Requirement Analysis
◦ Gather and analyze the requirements of the
database from stakeholders.
◦ Understand the data needs, use cases, and
constraints.
Conceptual Modeling
◦ Use ER diagrams to visualize entities, attributes,
and relationships.
Logical Modeling
◦ Convert the conceptual model into a relational
schema.
◦ Normalize the schema to eliminate redundancy.
Steps in Database Design
Physical Modeling
◦ Define how the data will be stored on the disk.
◦ Optimize for performance and storage efficiency.
Implementation
◦ Use a DBMS to create the database schema and
populate it with data.
Testing and Validation
◦ Verify that the database meets requirements and
performs efficiently.
Key Principles of Good
Database Design
Simplicity: Keep the design as simple as
possible while meeting requirements.
Scalability: Ensure the database can
handle growing amounts of data.
Integrity: Enforce data integrity through
constraints and normalization.
Flexibility: Design for adaptability to
future changes.
Performance: Optimize for query speed
and storage efficiency.
Challenges in Database Design
Balancing normalization and performance.
Handling complex relationships and large
datasets.
Addressing security and access control
requirements.
Accommodating changes in requirements.
Tools for Database Design
ERD Tools:
◦ Lucidchart, draw.io, Microsoft Visio
DBMS Tools:
◦ MySQL Workbench, Microsoft SQL Server
Management Studio (SSMS), Oracle SQL
Developer
Other Tools:
◦ dbdiagram.io, DbSchema, Navicat
Week 4
52
Relational Algebra and Relational
Calculus
Relational Algebra and Relational Calculus are
two core theoretical concepts in the field of
database management systems (DBMS). These
languages provide the foundation for querying
and manipulating relational databases. While
relational algebra is a procedural language,
relational calculus is a non-procedural (or
declarative) language. Understanding these
concepts is crucial for anyone studying
databases and query languages like SQL
Relational Algebra
Relational Algebra is a procedural query
language, meaning it describes a sequence
of operations that need to be performed
on the data. The result of each operation
is a relation (a table), and these relations
can be further manipulated by other
operations.
Basic Operations in Relational
Algebra
1. Select (σ)
• Description: The Select operation is
used to filter rows based on a specified
condition. It is equivalent to the WHERE
clause in SQL.
• Syntax: σ (Condition)(Relation)
• Example: σ (Age > 30)(Employee)
• This operation will return all rows from the
Employee relation where the Age is greater
than 30.
Basic Operations in Relational
Algebra
2. Project (π)
Description: The Project operation is
used to select specific columns from a
relation. It is similar to the SELECT clause
in SQL.
Syntax: π (Column1, Column2,
...)(Relation)
Example: π Name, Age (Employee)
◦ This operation will return only the Name and
Age columns from the Employee relation.
Basic Operations in Relational
Algebra
3. Union (∪)
Description: The Union operation
combines the results of two relations and
returns all distinct rows. It is similar to
the UNION operator in SQL.
Syntax: Relation1 ∪ Relation2
Example: Employee ∪ Manager
◦ This operation will return a new relation that
includes all employees and managers,
eliminating duplicates.
Basic Operations in Relational
Algebra
4. Set Difference (−)
Description: The Set Difference
operation returns the rows that are in
one relation but not in another. It is
similar to the EXCEPT operator in SQL.
Syntax: Relation1 − Relation2
Example: Employee − Manager
◦ This operation will return all employees who
are not managers.
Basic Operations in Relational
Algebra
5. Cartesian Product (×)
Description: The Cartesian Product
operation combines each row of one
relation with each row of another relation. It
produces a relation with every possible
combination of rows.
Syntax: Relation1 × Relation2
Example: Employee × Department
◦ This operation will combine each employee with
each department, resulting in a relation where
every employee is paired with every department.
Basic Operations in Relational
Algebra
6. Rename (ρ)
Description: The Rename operation
changes the name of a relation or its
attributes.
Syntax: ρ (NewRelationName)(Relation)
Example: ρ (Employee1)(Employee)
◦ This operation renames the relation
Employee to Employee1.
Basic Operations in Relational
Algebra
7. Join (⨝)
Description: The Join operation
combines rows from two relations based
on a common attribute. It is often used to
combine data from two related tables.
Syntax: Relation1 ⨝ Relation2
Example: Employee ⨝ Department
◦ This operation will join the Employee and
Department relations based on the common
attribute (for instance, Dept_ID).
Basic Operations in Relational
Algebra (Extended Operations)
In addition to the basic operations,
relational algebra also includes more
advanced operations:
Natural Join (⨝)
◦ Description: A Natural Join is a specific type
of join where only columns with the same
name in both relations are used as the basis
for joining.
◦ Example: Employee ⨝ Department
This will automatically join the relations based on
common attributes like Dept_ID.
Week 5
63
Basic Operations in Relational
Algebra (Extended Operations)
Division (÷)
Description: The Division operation is used
when we need to find tuples in one relation
that are related to all tuples in another
relation.
Example:
◦ Suppose we have two relations: Employee(Name,
Skill) and SkillRequired(Skill).
◦ The division operation helps to find employees
who possess all skills listed in the SkillRequired
relation.
Relational Calculus
Relational Calculus is a declarative query
language used to describe what data to
retrieve rather than how to retrieve it. It
defines queries using predicates and
logical formulas.
There are two main types of relational
calculus:
Tuple Relational Calculus (TRC)
In Tuple Relational Calculus, we specify the
desired tuples (or rows) of a relation using
variables and conditions that must be
satisfied. It uses tuple variables and
predicates.
Syntax: {t | P(t)}
◦ t is a tuple variable (a row in the relation), and
P(t) is the predicate that the tuple must satisfy.
Example: {t.Name | ∃e (Employee(e) ∧
e.Age > 30 ∧ e.Name = t.Name)}
◦ This query finds the names of employees whose
age is greater than 30.
Domain Relational Calculus (DRC)
In Domain Relational Calculus, we deal with
domain variables rather than tuples. Each
domain variable represents an individual
attribute value.
Syntax: {<x1, x2, ..., xn> | P(x1, x2, ..., xn)}
◦ x1, x2, ..., xn are domain variables, and P(x1, x2, ...,
xn) is the predicate.
Example: {<Name> | ∃Age
(Employee(Name, Age) ∧ Age > 30)}
◦ This query retrieves the names of employees
whose age is greater than 30.
Key Differences between TRC and
DRC
TRC uses tuples as variables, while DRC
uses domain variables (attribute values).
TRC is more expressive and intuitive in
some cases because it works directly with
tuples.
DRC is often simpler when dealing with
specific attributes, as it treats each
attribute separately.
Relational Algebra vs. Relational
Calculus
Procedural vs. Declarative: Relational Algebra
tells us how to get the result, whereas Relational
Calculus tells us what the result should be, leaving
the how to the DBMS.
Execution: While relational algebra is more
procedural and closely related to how queries are
processed in a DBMS, relational calculus is more
about expressing the logic of what is needed from
the database.
Application: SQL is closely based on relational
algebra, but many features of SQL (such as
subqueries) are more closely related to relational
calculus.
Practical Applications and SQL
Both relational algebra and relational
calculus serve as the foundation for SQL.
SQL queries are generally equivalent to
relational algebra expressions. However,
SQL also incorporates additional features
that go beyond pure relational algebra or
relational calculus, such as aggregation
(e.g., COUNT, SUM), ordering, and nested
queries.
Practical Applications and SQL
For example:
A query like SELECT Name, Age FROM
Employee WHERE Age > 30 is based on
the relational algebra expression π Name,
Age (σ Age > 30 (Employee)).
A subquery like SELECT Name FROM
Employee WHERE Age > (SELECT
AVG(Age) FROM Employee) is based on
relational calculus.
Week 6
72
Basic SQL (Structured Query
Language)
SQL (Structured Query Language) is the
standard language used for managing and
manipulating relational databases. It
provides a powerful and easy-to-
understand syntax for querying, inserting,
updating, and deleting data. SQL is the
foundation for working with relational
database management systems (RDBMS)
like MySQL, PostgreSQL, SQL Server, and
Oracle.
SQL Basics
SQL operates on a set of relational tables
(or relations) and uses queries to
perform various operations like data
retrieval, updates, and table management.
SQL is divided into different categories
based on the operations being performed.
Why SQL?
Data Management: Efficiently organize and
manage large datasets.
Querying Data: Retrieve specific data from
massive databases.
Data Analysis: Perform aggregations, filtering,
and data transformations.
Interoperability: Works with all relational
database systems (MySQL, PostgreSQL, SQLite,
etc.).
Portability: SQL can run across different
platforms with minimal changes.
Standardization: SQL is governed by standards,
ensuring consistency.
Categories of SQL Commands
Data Definition Language (DDL): Used to define and
modify database structures.
◦ Examples: CREATE, ALTER, DROP
Data Manipulation Language (DML): Used to
manipulate data within tables.
◦ Examples: INSERT, UPDATE, DELETE
Data Query Language (DQL): Used to query and
retrieve data.
◦ Example: SELECT
Data Control Language (DCL): Used to manage access
and permissions.
◦ Examples: GRANT, REVOKE
Transaction Control Language (TCL): Used to manage
database transactions.
◦ Examples: COMMIT, ROLLBACK, SAVEPOINT
Basic SQL Commands
Data Query Language (DQL)
The most common and essential SQL
command for querying data is SELECT.
SELECT: Retrieves data from one or
more tables in the database.
Syntax:
SELECT column1, column2, ... FROM table_name WHERE
condition;
Basic SQL Commands
Example:
Example:
SELECT Name, Age FROM Employee
ORDER BY Age DESC;
Syntax:
SELECT column1, column2 FROM
table_name LIMIT number_of_rows;
Basic SQL Commands
LIMIT / OFFSET: Limits the number of
rows returned by a query (useful for
pagination).
Example:
SELECT Name FROM Employee LIMIT 5;
90
Basic SQL Commands
Example:
101
Joins in SQL
Example: REVOKE SELECT ON Employee
FROM user1;
SQL joins allow you to combine rows from
two or more tables based on a related
column between them. Joins are essential for
querying multiple related tables.
INNER JOIN: Returns only the rows
where there is a match in both tables.
Syntax:
SELECT columns FROM table1 INNER JOIN table2 ON
table1.column = table2.column;
Joins in SQL
Example:
SELECT Employee.Name,
Department.Name
FROM Employee
INNER JOIN Department
ON Employee.Department =
Department.DepartmentID;
Joins in SQL
LEFT JOIN (OUTER JOIN): Returns
all rows from the left table and matching
rows from the right table. If there is no
match, NULL values are returned for
columns from the right table.
Syntax:
SELECT columns FROM table1 LEFT JOIN table2
ON table1.column = table2.column;
Joins in SQL
Example:
SELECT Employee.Name,
Department.Name
FROM Employee
LEFT JOIN Department
ON Employee.Department =
Department.DepartmentID;
Joins in SQL
RIGHT JOIN (OUTER JOIN): Returns
all rows from the right table and matching
rows from the left table. If there is no
match, NULL values are returned for
columns from the left table.
Syntax:
SELECT columns FROM table1 RIGHT JOIN table2 ON
table1.column = table2.column;
Joins in SQL
FULL OUTER JOIN: Returns all rows
when there is a match in one of the
tables.
Syntax:
SELECT columns FROM table1 FULL OUTER
JOIN table2 ON table1.column = table2.column;
SQL Functions
SQL provides several built-in functions to
perform operations on data.
Aggregate Functions:
◦ COUNT(): Counts the number of rows.
◦ SUM(): Calculates the sum of a numeric
column.
◦ AVG(): Calculates the average of a numeric
column.
◦ MAX(): Returns the maximum value.
◦ MIN(): Returns the minimum value.
SQL Functions
Example:
124
Constraints in DBMS
Constraints are predefined rules enforced
on data in database tables to maintain the
integrity, validity, and consistency of the
data. Constraints restrict the type of data
that can be entered into a table.
Constraints in DBMS
Types of Constraints:
NOT NULL Constraint
Ensures that a column cannot have NULL
values, i.e., every row must have a value for this
column.
Used when a field is mandatory.
Syntax
CREATE TABLE Employee (
EmployeeID INT,
Name VARCHAR(50) NOT NULL
);
Constraints in DBMS
Example:A Name column in an
Employee table must always have a value.
Constraints in DBMS
UNIQUE Constraint
◦ Ensures all values in a column or a combination
of columns are unique.
◦ Allows NULL values (but only one per column).
Syntax:
CREATE TABLE Employee (
EmployeeID INT UNIQUE,
Name VARCHAR(50)
);
Example:Employee ID must be unique to
identify each employee.
Constraints in DBMS
PRIMARY KEY Constraint:
◦ Combines NOT NULL and UNIQUE.
◦ Uniquely identifies each record in the table.
◦ A table can only have one PRIMARY KEY.
Syntax:
CREATE TABLE Employee (
EmployeeID INT PRIMARY KEY,
Name VARCHAR(50)
);
Constraints in DBMS
PRIMARY KEY Constraint:
◦ Composite Primary Key
Example:
CREATE TABLE Enrollment (
StudentID INT,
CourseID INT,
PRIMARY KEY (StudentID, CourseID)
);
Constraints in DBMS
FOREIGN KEY Constraint:
Enforces referential integrity by linking
columns in two tables.
Ensures that the value in a column
matches a value in another table.
Constraints in DBMS
FOREIGN KEY Constraint:
Syntax:
CREATE TABLE Department (
DepartmentID INT PRIMARY KEY,
DepartmentName VARCHAR(50)
);
137
Views in DBMS
A view is a virtual table based on the
result of a query. It does not store data
itself but retrieves data from one or more
base tables.
Views in DBMS
Characteristics of Views
Logical Representation: Views are used to
represent subsets of data logically.
No Data Storage: Views do not store data
themselves.
Dynamic: Always reflect the latest data in the
underlying tables.
Read and Write Access: Depending on the
database, views may allow updates.
Views in DBMS
Creating a View:
Syntax:
CREATE VIEW view_name AS
SELECT column1, column2
FROM table_name
WHERE condition;
Example:
Views in DBMS
Example
CREATE VIEW ActiveEmployees AS
SELECT EmployeeID, Name, Status
FROM Employee
WHERE Status = 'Active';
Views in DBMS
Querying a View:
Views can be queried like tables.
Syntax:
DROP VIEW view_name;
Example:
DROP VIEW ActiveEmployees;
Views in DBMS
Materialized Views:
◦ Unlike regular views, materialized views store
query results physically on disk.
◦ They are useful for performance optimization
when dealing with large datasets.
Syntax:
CREATE MATERIALIZED VIEW
materialized_view_name AS
SELECT column1, column2
FROM table_name;
Views in DBMS
Advantages of Views:
Data Security:
Restrict access to specific rows or columns.
Example
CREATE VIEW ManagerView AS
SELECT EmployeeID, Name
FROM Employee
WHERE Role = 'Manager';
Views in DBMS
Simplification:
Encapsulate complex queries.
Example:
CREATE VIEW EmployeeStats AS
SELECT DepartmentID, COUNT(*) AS TotalEmployees
FROM Employee
GROUP BY DepartmentID;
Logical Independence:
◦ Abstract underlying table structure from users.
Data Consistency:
◦ Provide a consistent view of the database.
Views in DBMS
Constraints vs. Views
Aspect Constraints Views
Ensure data validity and
Simplify data retrieval and
Purpose integrity during
restrict access.
insert/update.
Exist as part of table Virtual tables; may store
Data Storage
metadata. data (materialized views).
Prevent invalid data from
Provide customized data
Scope being entered into the
representations for queries.
table.
Filtered views, aggregations,
NOT NULL, CHECK,
Examples or specific columns from
FOREIGN KEY.
multiple tables.
Views in DBMS
Simplification:
Encapsulate complex queries.
Example:
CREATE VIEW EmployeeStats AS
SELECT DepartmentID, COUNT(*) AS TotalEmployees
FROM Employee
GROUP BY DepartmentID;
Logical Independence:
◦ Abstract underlying table structure from users.
Data Consistency:
◦ Provide a consistent view of the database.
Views in DBMS
Practical Examples:
Example: Constraints
CREATE TABLE Product (
ProductID INT PRIMARY KEY,
ProductName VARCHAR(50) NOT NULL,
Price DECIMAL(10, 2) CHECK (Price > 0),
CategoryID INT,
FOREIGN KEY (CategoryID) REFERENCES
Category(CategoryID)
);
Views in DBMS
Example:View
151
Indexing and Hashing in DBMS
Indexing and hashing are techniques used
in database management systems (DBMS)
to optimize the performance of data
retrieval. They help to reduce the time
complexity of query execution by
minimizing the number of disk I/O
operations required to locate data.
Indexing and Hashing in DBMS
Indexing in DBMS:
Indexing is a data structure technique
used to efficiently retrieve records from a
database table. It creates an auxiliary
structure that provides quick access to
specific data rows based on a key.
Indexing and Hashing in DBMS
What is an Index?
165
Hashing in DBMS
Hashing is a technique used to map data
to a fixed-size address space using a hash
function. It is commonly used in situations
where data retrieval is required in
constant or near-constant time.
Hashing in DBMS
What is Hashing?
Hash Function: A mathematical function
that converts input data into a fixed-size
hash code or hash value.
The hash value is used as an address to
locate the data in a hash table.
Example: If a StudentID is hashed to the
value 5, the corresponding student record
is stored at index 5 in the hash table.
Hashing in DBMS
Types of Hashing:
Static Hashing:
◦ The hash table size is fixed, and the hash
function maps data into a fixed set of buckets.
◦ Advantages:
Simple to implement.
Works well for databases with a fixed amount of
data.
◦ Disadvantages:
Performance degrades as the table becomes full.
Poor handling of dynamic datasets.
Hashing in DBMS
Types of Hashing:
Dynamic Hashing:
◦ The hash table grows or shrinks dynamically
as the dataset changes.
◦ Techniques:
Extendible Hashing: Uses a directory that grows
dynamically as more data is added. Buckets can split
when they overflow.
Linear Hashing: Buckets are split in a predefined
sequence, eliminating the need for a directory.
Hashing in DBMS
Hashing Techniques
◦ Division Method:
The hash value is computed as h(k) = k % n, where
k is the key and n is the number of buckets.
Example: For a key 101 and n = 10, the hash value
is 101 % 10 = 1.
Hashing in DBMS
Hashing Techniques
◦ Multiplication Method:
◦ The hash value is computed as h(k) = floor(n
* (k * A % 1)), where A is a constant.
◦ Reduces the likelihood of collisions compared
to the division method.
◦ Universal Hashing:
◦ Uses a randomized hash function to minimize
the chance of collisions.
Hashing in DBMS
Collision Resolution in Hashing
◦ When two keys hash to the same bucket, a
collision occurs. Several methods are used to
handle collisions:
◦ Chaining:
Each bucket points to a linked list of entries that
hash to the same bucket.
Example
Bucket 0: Key1 -> Key2
Hashing in DBMS
Collision Resolution in Hashing
◦ Open Addressing:
◦ All data is stored within the hash table itself, and
collisions are resolved by probing (searching) for
the next available slot.
◦ Techniques:
◦ Linear Probing: Search sequentially for the next
empty slot.
◦ Quadratic Probing: Search using a quadratic
formula (e.g., 1², 2²).
◦ Double Hashing: Use a second hash function to
determine the step size for probing.
Hashing in DBMS
Advantages of Hashing
◦ Constant-Time Access: Offers O(1) average
time complexity for data retrieval.
◦ Dynamic Data Handling: Suitable for dynamic
datasets when using dynamic hashing techniques.
Disadvantages of Hashing
◦ Collisions: Handling collisions can add
complexity and overhead.
◦ Storage Overhead: Requires additional space
for the hash table and collision resolution
structures.
Indexing vs. Hashing
177
Transaction Management and
Concurrency Control
Transaction management and concurrency
control are crucial components of a
Database Management System (DBMS) to
ensure data consistency, integrity, and
availability in multi-user environments.
Transaction Management and
Concurrency Control
Transactions in DBMS:
What is a Transaction?
◦ A transaction is a sequence of one or more
database operations (such as insert, update,
delete, or retrieve) executed as a single logical
unit of work.
A transaction must maintain the ACID
properties to ensure data integrity.
Transaction Management and
Concurrency Control
ACID Properties
◦ Atomicity:
◦ Ensures that a transaction is executed completely or
not at all.
◦ If a transaction fails, all changes made during the
transaction are rolled back.
◦ Example: Transferring money from one bank account
to another must either complete fully or not occur at
all.
◦ Consistency:
◦ Ensures that the database transitions from one valid
state to another after a transaction.
◦ Example: The total balance in all accounts should
remain unchanged after a transaction.
Transaction Management and
Concurrency Control
ACID Properties
◦ Isolation:
◦ Ensures that concurrently executing transactions
do not interfere with each other.
◦ Example: Two users withdrawing money from the
same account simultaneously should not result in
an inconsistent state.
◦ Durability:
◦ Ensures that once a transaction is committed, its
changes are permanently recorded in the database,
even in the case of system failure.
◦ Example: Once money is transferred between
accounts, the changes must persist even if the
system crashes.
Transaction Management and
Concurrency Control
States of a Transaction:
◦ Active: The transaction is being executed.
◦ Partially Committed: All operations are
completed, but the changes are not yet saved to
the database.
◦ Committed: The transaction has completed
successfully, and the changes are saved.
◦ Failed: An error occurred, and the transaction
cannot proceed.
◦ Aborted: The transaction is rolled back, undoing
all changes made during its execution.
Transaction Management and
Concurrency Control
Concurrency in DBMS:
Concurrency control is the process of
managing simultaneous transaction
execution in a multi-user environment
while ensuring consistency and isolation.
Transaction Management and
Concurrency Control
Problems in Concurrency
◦ Lost Update Problem:
◦ Occurs when two transactions update the same
data item, and one update overwrites the other.
◦ Example:
Transaction T1 reads a value X = 100 and increments it
by 10.
Transaction T2 reads the same X and decrements it by
5.
The final value of X should be 105, but it may incorrectly
become 95 or 110.
Transaction Management and
Concurrency Control
Problems in Concurrency
◦ Dirty Read Problem:
◦ Occurs when a transaction reads uncommitted
changes made by another transaction.
◦ Example: T1 updates a value, and T2 reads the
updated value before T1 commits. If T1 rolls back,
T2 has used invalid data.
◦ Non-repeatable Read:
◦ Occurs when a transaction reads the same data
item multiple times and gets different results.
◦ Example: T1 reads a value, T2 modifies it, and T1
reads it again.
Transaction Management and
Concurrency Control
Problems in Concurrency
◦ Phantom Read:
Occurs when a transaction retrieves a set of rows,
and another transaction inserts or deletes rows,
causing changes in subsequent reads.
Example: T1 reads a list of accounts, T2 adds a new
account, and T1 re-reads the list, finding an extra
account.
Transaction Management and
Concurrency Control
Concurrency Control Techniques:
◦ Lock-Based Protocols
◦ Locks prevent multiple transactions from
accessing the same data item simultaneously.
◦ Types of Locks:
Shared Lock (S): Allows read access but prevents
write access.
Exclusive Lock (X): Allows write access and
prevents both read and write access by other
transactions.
Transaction Management and
Concurrency Control
Concurrency Control Techniques:
◦ Two-Phase Locking (2PL):
◦ Divides the execution of a transaction into two phases:
Growing Phase: Locks are acquired, but no locks are released.
Shrinking Phase: Locks are released, but no new locks are
acquired.
◦ Guarantees serializability but can lead to deadlocks.
◦ Deadlock Handling:
◦ Detection and Recovery: Detect deadlocks and abort
one or more transactions.
◦ Prevention: Ensure that transactions do not enter a
deadlock state by acquiring all necessary locks in advance.
Transaction Management and
Concurrency Control
Timestamp-Based Protocols:
◦ Transactions are assigned timestamps at the
start.
◦ Transactions are executed in the order of
their timestamps.
◦ Rules:
◦ If a transaction tries to access a data item and
its timestamp is older than the data's last
modified timestamp, the transaction is
aborted.
◦ Ensures serializability without locks.
Transaction Management and
Concurrency Control
Multiversion Concurrency Control
(MVCC):
◦ Maintains multiple versions of data items to
allow concurrent read and write operations.
◦ Transactions read the version of the data item
that was committed at the start of the
transaction.
◦ Used in modern databases like PostgreSQL
and MySQL.
Transaction Management and
Concurrency Control
Optimistic Concurrency Control:
◦ Assumes that conflicts are rare.
◦ Transactions execute without
restrictions, and conflicts are checked at
the commit stage.
◦ If a conflict is detected, one transaction
is rolled back.
Transaction Management and
Concurrency Control
Transaction Recovery:
Transaction recovery ensures that the
database remains consistent in the case of
system failures.
Types of Failures
◦ Transaction Failure: Caused by logical errors
(e.g., division by zero) or system errors (e.g.,
deadlocks).
◦ System Failure: Caused by hardware or
software crashes.
◦ Media Failure: Caused by physical damage to
storage media.
Transaction Management and
Concurrency Control
Recovery Techniques:
◦ Undo Logging:
◦ Records old values of data items before they are modified.
◦ Rollback can restore the original values in case of failure.
◦ Redo Logging:
◦ Records new values of data items after they are modified.
◦ Replay can apply the changes after a crash.
◦ Undo-Redo Logging:
◦ Combines undo and redo logging to handle failures at any stage.
◦ Checkpointing:
◦ Periodically saves the current state of the database to reduce
recovery time.
Transaction Management and
Concurrency Control
Isolation Levels in SQL:
◦ DBMSs provide different isolation levels to
balance concurrency and consistency.
◦ Read Uncommitted:
Transactions can read uncommitted data.
May cause dirty reads.
◦ Read Committed:
Transactions cannot read uncommitted data.
Prevents dirty reads but allows non-repeatable
reads.
Transaction Management and
Concurrency Control
Isolation Levels in SQL:
◦ Repeatable Read:
Ensures that data read by a transaction remains
consistent during its execution.
Prevents dirty and non-repeatable reads but allows
phantom reads.
◦ Serializable:
Ensures complete isolation between transactions.
Prevents dirty reads, non-repeatable reads, and
phantom reads.
Most restrictive isolation level.
Transaction Management and
Concurrency Control
Practical Applications:
◦ Banking Systems: Ensure atomic money
transfers and prevent concurrency issues.
◦ E-commerce: Handle multiple users
accessing the same inventory simultaneously.
◦ Reservation Systems: Prevent overbooking
by managing simultaneous bookings.
Week 14
197
NoSQL Databases
NoSQL (Not Only SQL) databases are
designed to handle large volumes of
unstructured, semi-structured, or
structured data. Unlike traditional
relational databases, NoSQL databases
offer flexible schemas, scalability, and high
performance, making them ideal for
modern applications.
What is NoSQL?
A class of databases that provides
mechanisms for storage and retrieval of
data beyond the traditional table
structures of relational databases.
Built for distributed data stores, NoSQL
databases can handle big data and real-
time web applications.
Characteristics of NoSQL
Schema Flexibility:No fixed schema;
data can evolve dynamically.
Suitable for handling unstructured and
semi-structured data.
Scalability:Horizontal scaling (adding
more servers) instead of vertical scaling
(upgrading server capacity).
Characteristics of NoSQL
High Availability:
◦ Ensures continuous availability even during
system failures.
CAP Theorem:
◦ Consistency: Every read gets the most recent
write or an error.
◦ Availability: Every request receives a response
without guarantee of the most recent data.
◦ Partition Tolerance: The system continues to
function despite network partitions.
◦ NoSQL databases often trade off between these
properties based on the use case.
Types of NoSQL Databases
Key-Value Stores
Document Stores
Column-Family Stores
Graph Databases
Types of NoSQL Databases
Types of NoSQL Databases:
◦ Key-Value Stores
◦ Structure:
Data is stored as key-value pairs.
Example: { "userID": "12345" }
◦ Use Cases:
Caching, session storage.
◦ Examples:
Redis, DynamoDB.
Types of NoSQL Databases
Document Stores
◦ Structure:
Data is stored as documents, often in JSON, BSON, or
XML format.
Example:
{
"userID": "12345",
"name": "John Doe",
"orders": [
{"orderID": "001", "amount": 100},
{"orderID": "002", "amount": 200}
]
}
Types of NoSQL Databases
Document Stores
◦ Use Cases:Content management systems,
catalogs, user profiles.
◦ Examples:MongoDB, CouchDB.
Types of NoSQL Databases
Column-Family Stores
◦ Structure:
Data is stored in columns grouped into families.
Example:
RowKey: 12345
Name: John Doe
OrderID: 001, 002
Amount: 100, 200
Example:
(User) -- [LIKES] --> (Product)
(User) -- [FRIEND] --> (User)
◦ Use Cases:
Social networks, recommendation engines, fraud
detection.
◦ Examples:
Neo4j, ArangoDB.
Advantages of NoSQL
Scalability:Easily scales horizontally to
handle massive data.
Flexibility:No rigid schema allows
changes without downtime.
High Performance:Optimized for high-
speed read and write operations.
Advantages of NoSQL
Distributed Architecture:
◦ Built to operate across multiple servers or
clusters.
Handles Big Data:
◦ Suitable for real-time data processing and
large-scale datasets.
Disadvantages of NoSQL
Lack of Standardization:No
standardized query language like SQL.
Eventual Consistency:Some NoSQL
databases prioritize availability and
partition tolerance over consistency.
Complexity in Relationships:Managing
relationships between entities is more
complex than in relational databases.
Limited Maturity:Newer than relational
databases, with less extensive tools and
community support.
NoSQL vs. SQL
Applications of NoSQL:
Social Media:
◦ Storing user data, relationships, and interactions.
◦ Example: Facebook using Cassandra for messaging.
E-commerce:
◦ Managing product catalogs and user profiles.
◦ Example: Amazon DynamoDB for handling shopping cart data.
Real-Time Analytics:
◦ Processing massive datasets with low latency.
◦ Example: Redis for caching.
IoT and Big Data:
◦ Storing and querying sensor data efficiently.
◦ Example: MongoDB and Cassandra.
Content Management:
◦ Managing unstructured data like videos, images, and documents.
◦ Example: Couchbase for media applications.
Querying NoSQL Databases
NoSQL databases do not use SQL but
offer their query mechanisms:
MATCH (u:User)-[:LIKES]->(p:Product)
RETURN u, p;
Week 15
215
Database Recovery Techniques
Database recovery ensures that a
database remains consistent and
operational after system failures, crashes,
or errors. Recovery mechanisms restore
the database to a correct state while
preserving data integrity.
What is Database Recovery?
Database recovery is the process of
restoring a database to a previous
consistent state after a failure.
Undo-Redo Logging:
◦ Combines both undo and redo logs to handle
failures at any stage.
Database Recovery Techniques
Checkpointing
A checkpoint is a snapshot of the database state at
a particular point in time.
Helps reduce recovery time by limiting the log
records that need to be processed.
Steps:
Flush all logs and modified data to stable storage.
Write a checkpoint record to the log.
Example Log Entry:
<CHECKPOINT>
Database Recovery Techniques
Recovery Steps:
◦ Start from the last checkpoint.
◦ Process logs to undo uncommitted
transactions and redo committed
transactions.
Week 16
228
Database Recovery Techniques
Shadow Paging
◦ Uses two copies of the database:
Current Page Table: The current version of the
database.
Shadow Page Table: A backup copy of the
database.
◦ Process:
Changes are made to a copy (shadow page table).
Once the transaction commits, the shadow table
replaces the current table.
Database Recovery Techniques
Advantages:
◦ No logging required.
◦ Ensures atomicity.
Disadvantages:
◦ High storage overhead.
◦ Difficult to implement for large databases.
Database Recovery Techniques
ARIES (Algorithm for Recovery and
Isolation Exploiting Semantics)
◦ ARIES is a popular recovery technique that
combines logging, checkpoints, and a
sophisticated recovery process.
◦ Key Features:
Write-Ahead Logging (WAL): Log records must be
written to stable storage before data changes.
Repeating History: During recovery, ARIES replays all
operations from the logs.
Selective Undo: Only the operations of uncommitted
transactions are undone.
Database Recovery Techniques
Phases:
◦ Analysis: Determine the state of all
transactions at the time of failure.
◦ Redo: Replay operations of committed
transactions.
◦ Undo: Rollback changes made by
uncommitted transactions.
Recovery in Distributed Databases
In distributed systems, transactions span
multiple sites, requiring coordination for
recovery.
Two-Phase Commit (2PC):
◦ Ensures atomicity across multiple sites.
◦ Steps:
Prepare Phase: The coordinator asks participants
if they are ready to commit.
Commit Phase: If all participants agree, the
transaction is committed.
Recovery in Distributed Databases
Disadvantages:
◦ High communication overhead.
◦ Blocking issues during failures.
237
Future Trends in Database
Management Systems (DBMS)
The field of Database Management
Systems (DBMS) continues to evolve
rapidly, driven by the increasing demands
of big data, cloud computing, AI
integration, and real-time analytics.
Emerging trends are reshaping how data
is stored, accessed, and managed.
What Drives DBMS Evolution?
Increasing data volume, variety, and
velocity (3Vs of Big Data).
Growing demand for real-time processing
and analytics.
Advances in technology like AI, IoT, and
cloud computing.
Objectives of Modern DBMS Trends
Enhance performance, scalability, and
reliability.
Improve flexibility for handling diverse
workloads.
Support integration with modern
applications and frameworks.
Future Trends in DBMS
Cloud-Based Databases
◦ Trend: Migration to cloud platforms for
scalable and cost-effective database solutions.
◦ Features:
On-demand scalability.
Managed services for reduced administrative
overhead.
Pay-as-you-go pricing.
◦ Examples:
Amazon RDS, Google BigQuery, Microsoft Azure
SQL Database.
Future Trends in DBMS
Benefits:High availability with minimal
downtime.
Seamless integration with other cloud-
native services.
Future Trends in DBMS
Multi-Model Databases:
Trend: Support for multiple data models (e.g.,
relational, document, graph) in a single system.
Features:
◦ Flexibility to store and query different data types.
◦ Unified access to diverse data models.
Examples:
◦ ArangoDB, Couchbase, Oracle.
Benefits:
◦ Simplifies application development by reducing the
need for multiple database systems.
NoSQL and NewSQL Evolution
NoSQL:
◦ Continued adoption for unstructured and semi-
structured data.
◦ Enhanced query capabilities in NoSQL databases.
◦ Examples: MongoDB, Cassandra, Redis.
NewSQL:
◦ Combines the scalability of NoSQL with the
ACID guarantees of traditional databases.
◦ Examples: CockroachDB, Google Spanner.
NoSQL and NewSQL Evolution
Benefits:Improved performance and
consistency for distributed systems.
Artificial Intelligence and
Machine Learning Integration
Trend: AI-driven databases that leverage
machine learning for optimization.
Applications:
◦ Query optimization using AI algorithms.
◦ Predictive maintenance for database systems.
◦ Automated indexing and schema design.
Artificial Intelligence and
Machine Learning Integration
Examples:Oracle Autonomous
Database.
Benefits:Reduces the need for manual
tuning and management.
Enhances performance and efficiency.
Real-Time Analytics and
Stream Processing
Trend: Increasing demand for real-time data
analysis and decision-making.
Features:
◦ Support for real-time data ingestion and querying.
◦ Integration with streaming platforms like Apache
Kafka.
Examples:
◦ Apache Druid, Snowflake, Google BigQuery.
Benefits:
◦ Enables applications like fraud detection, IoT
monitoring, and personalized recommendations.
Blockchain and Decentralized
Databases
Trend: Integration of blockchain technology into
databases.
Features:
◦ Immutable ledgers for secure and transparent data
storage.
◦ Support for distributed and decentralized systems.
Examples:
◦ BigchainDB, Hyperledger Fabric.
Benefits:
◦ Enhanced data security and auditability.
◦ Suitable for industries like finance, supply chain, and
healthcare.
In-Memory Databases
Trend: Use of in-memory databases for
ultra-fast performance.
Features:
◦ Data stored in RAM instead of traditional
storage.
◦ Optimized for low-latency access.
Examples:
◦ SAP HANA, Redis, Memcached.
Benefits:
◦ Accelerates analytics and real-time processing.
Challenges in Adopting New Trends
Integration with Legacy Systems:
◦ Difficulty in integrating modern DBMS with
existing infrastructure.
Cost:
◦ High cost of adopting and maintaining advanced
systems.
Security Concerns:
◦ Increased attack surface with distributed and
cloud-based databases.
Skill Gap:
◦ Need for specialized knowledge to manage
advanced databases.