Module 4
1- Database Operating Systems
1.1 INTRODUCTION
• Earlier, database systems were built on top of general-purpose operating systems.
• This was not efficient, as general OSs don’t provide special features needed for
databases.
• A database operating system is designed especially for databases, offering better
performance, less overhead, and features like transaction management, concurrency,
etc.
Two approaches:
1. Traditional – Add database features over a general OS (less efficient).
2. Specialized DB OS – All DB functions are built directly into the OS (more efficient).
1.2 WHAT IS DIFFERENT?
General-purpose OS supports:
• Process creation, memory, and file management.
• Buffer, virtual memory, I/O, protection, etc.
BUT for databases:
• Not suitable for handling huge, complex, persistent data.
• General OS supports only small files and lacks specialized features for DBs.
Key DB requirements:
• Buffer Management: OS buffers are not optimized for databases.
• Crash Recovery: DBs need specific pages to be written in order, which OS doesn't
support.
• Page Replacement: Databases use their own buffer pools and page policies (like LRU)
for better performance.
2- REQUIREMENTS OF A DATABASE OPERATING SYSTEM
To support database goals, a DB OS must provide:
1. Transaction Management
• A transaction is a program that performs a group of DB operations.
• It should follow ACID properties (Atomicity, Consistency, Isolation, Durability).
• Must handle:
1. Multiple users running transactions at the same time (concurrency control).
2. Failures during transactions (recovery support).
2. Support for Persistent, Complex Data
• DBs deal with huge, structured data stored on disks.
• OS must allow defining and accessing complex records and files efficiently.
• DBs need to manage I/O efficiently since disk access is slow.
• OS should place related data blocks nearby on the disk for faster access.
3. Buffer Management
• DB data is stored on disk but accessed via memory buffers.
• When a page is needed, it's brought into memory from disk.
• If memory is full, one page must be removed (page replacement).
• DB OS keeps track of:
1. Dirty pages (modified data).
2. Log of transaction operations (for rollback and recovery).
3. Intention list and flags for safety.
3- TRANSACTION PROCESS MODEL
Transaction Processing Model (from Distributed Database Systems)
What is a Transaction?
A transaction is a small program or task that reads and writes data in a database.
It should:
• Keep the database consistent (valid and accurate).
• Complete within a finite time.
• Work as one logical unit (either fully completes or doesn’t happen at all).
What is Transaction Processing?
In a Distributed Database System (DDBS), data is not stored in one place – it is spread across
many sites or computers connected through a network.
When a user runs a transaction (like a bank transfer or product update), the data involved in
that transaction may exist on different sites. So, the system splits the transaction into smaller
parts called sub-transactions, and sends them to the appropriate sites for execution.
Components in Distributed Transaction Execution:
Component Role
TM (Transaction Manager) Starts and manages transactions. Sends sub-
tasks to the correct sites
Scheduler Decides the order in which tasks are executed
at a site.
DM (Data Manager) Executes the read/write operations on the
database.
D1, D2,..DN Local databases at different sites.
Steps in Transaction Processing:
1. TM receives a user’s transaction.
2. TM divides the transaction into sub-transactions (based on which data is stored where).
3. TM sends tasks to the Schedulers and Data Managers (DMs) at different sites.
4. DMs execute read/write operations.
5. Results are collected and sent back to TM.
Advantages of DDBS Transaction Model:
• Improved performance: Transactions can run in parallel.
• Easy expansion: You can add new sites without stopping the system.
• Reliability: If one site fails, others still work.
• Large user base support: Can handle more users than a single-site DB.
4- Synchronization Primitives
What are Synchronization Primitives?
• In database systems, synchronization primitives are basic tools or techniques used to
control access to shared data when multiple transactions are happening at the same time
(concurrently).
• They help ensure that transactions do not interfere with each other and the database
stays consistent and correct.
There are two main types:
1. Locks
What is a Lock?
A lock is a control mechanism that prevents other transactions from accessing a data object
(like a record or file) while one transaction is using it.
• Every data object (e.g., a row or file) can be locked before use.
• A transaction must lock a data object before reading or writing to it.
Types of Locks:
1. Exclusive Lock (X-lock):
• Only one transaction can access the data.
• No other transaction can read or write while it's locked.
• Used when the transaction wants to update/write the data.
2. Shared Lock (S-lock):
• Multiple transactions can access the data at the same time, but only for
reading.
• No one can write when it's in shared mode.
Use: To make sure data doesn’t get corrupted when many users try to read/write at the same
time.
2. Timestamps
What is a Timestamp?
A timestamp is a unique number given to each transaction to show the order in which they
occur.
• Timestamps are created in increasing order, meaning a newer transaction will always
have a higher timestamp than an older one.
• They help in deciding which transaction should go first when there is a conflict.
Example:
• Suppose Transaction T1 has timestamp = 5
• Transaction T2 has timestamp = 10
• T1 is older than T2.
So, in case of a conflict, T1 will get priority, and T2 may have to wait, rollback, or retry.
Properties of Timestamps:
1. Uniqueness:
• No two transactions get the same timestamp.
• Ensures clear ordering.
2. Monotonicity:
• Time values always increase (don’t go backward).
• Prevents confusion in ordering.
Use of Timestamps in Concurrency Control:
Timestamps help in:
• Ordering transactions automatically.
• Ensuring serializability (safe execution as if one by one).
5- Concurrency Control Algorithms
1. Completely Centralized Algorithm (CCA)
Idea: A central site manages and performs all updates in the system.
How it Works:
• Every site sends update requests to the central site.
• The central site processes and broadcasts the result to other sites.
Advantages:
• Simple to implement.
• Centralized control ensures consistency.
Disadvantages:
• Central site is a single point of failure.
• Can become a performance bottleneck.
Example:
• Site A wants to update X.
• Sends to central site → central site updates X → informs all other sites.
Centralized control, but central site failure stops everything.
2. Centralized Locking Algorithm (CLA)
Idea: Each site performs its own transactions, but locking is done centrally.
How it Works:
• Before accessing data, a site sends a lock request to the central lock manager.
• Lock is granted if no conflict; otherwise, the site waits in queue.
• After update, site releases the lock and sends an update message.
Advantages:
• Supports distributed transaction execution.
• Centralized lock avoids lock inconsistencies.
Disadvantages:
• Still depends on central site for locks.
• Not crash-resistant; if central site fails, system halts.
Example:
• Site A asks to lock Y → central site checks and grants it.
• After use, Site A releases the lock → others can then access Y.
Better than CCA, but still fails if lock manager crashes.
3. INGRES Primary-Site Locking Algorithm
Idea: Each data object has a designated primary site where all updates for that object occur.
How it Works:
• Each site runs transactions locally.
• Updates are sent to the primary site of the data object for final execution.
• Slave processes prepare updates and send them to master.
Advantages:
• Distributes the workload.
• Avoids bottleneck of a single central site.
Disadvantages:
• Communication delay due to sending updates to primary site.
• Possible inconsistency if site crashes mid-process.
Example:
• Site B wants to update Z → sends to Site A (primary for Z) → update done.
Load is shared, but adds communication delay.
4. Two-Phase Locking Algorithm (2PL)
Idea: Transactions lock all needed data items before accessing them and release all locks only
after completion.
How it Works:
• Phase 1 (Growing): Locks are acquired.
• Phase 2 (Shrinking): Locks are released.
• No new lock is allowed once a lock is released.
Advantages:
• Ensures serializability (safe transaction ordering).
• Widely used and reliable.
Disadvantages:
• Can cause deadlocks.
• Can delay transactions if a site holding a lock crashes.
Example:
• T1: locks X, Y, performs operations → releases both.
• T2: waits until T1 finishes and releases locks.
Guarantees consistency, but deadlocks possible.