Week7 Lecture
Week7 Lecture
Introduction to Databases
1
Databases
Transactions Management Overview
Chapter 16
2
Structure of a DBMS
Queries Answers
• A typical DBMS has a
layered architecture.
Query Optimization
• The figure does not show the and Execution
concurrency control and
recovery components. Relational Operators
system has its own Buffer Management crucial role in optimizing data access by
efficiently managing the storage and
retrieval of data pages in memory.
DB
3
Structure of a DBMS
Query Optimization
and Execution
Relational Operators
DB 4
Structure of a DBMS
DB 5
Why concurrency?
In summary, concurrent execution is a key strategy for optimizing the performance of a DBMS. It allows the system to make efficient use of resources,
maintain high throughput, and reduce latency, ultimately providing a more responsive and efficient database environment.
• Concurrent execution of user programs is essential for good
DBMS performance.
• Throughput (transactions per second)
• Mutli-core and single core
• Because disk accesses are frequent, and relatively slow, it is important to keep the cpu
humming by working on several user programs concurrently.
1. **Throughput (transactions per second):**
• Latency (average response time per transaction)
- Concurrent execution allows the DBMS to process multiple transactions simultaneously. This parallel processing capability improves the overall throughput of the system.
- Multiple transactions can be in different stages of execution concurrently, leading to a higher number of transactions processed per unit of time.
7
Example 1
• Consider three queries:
• Q1: UPDATE Account SET savings = savings + 100 WHERE customer_id = 1;
• Q2: UPDATE Account SET savings = savings - 300 WHERE customer_id = 2;
• Q3: UPDATE Account SET savings = savings - 400 WHERE customer_id = 3;
• Now consider this new query:
• Q4: SELECT MIN(savings) from Account;
• Suppose the initial savings were 900, 1000 and 1000:
• Q4 returns 900 if ran before Q1, Q2, and Q3
• Query returns 1000 if ran after Q1
• Query returns 700 If ran after Q1 and Q2
• Query returns 600 if ran after Q1, Q2 and Q3
8
Example 1
• Consider three queries:
• Q1: UPDATE Account SET savings = savings + 100 WHERE customer_id = 1;
• Q2: UPDATE Account SET savings = savings - 300 WHERE customer_id = 2;
• Q3: UPDATE Account SET savings = savings - 400 WHERE customer_id = 3;
• Now consider this new query:
• Q4: SELECT MIN(savings) from Account;
• Suppose the initial savings were 900, 1000 and 1000:
• Q4 returns 900 if ran before Q1, Q2, and Q3
The order of operations is important
• Query returns 1000 if ran after Q1
The user will need a way to specify the order
• Query returns 700 If ran after Q1 and Q2
• Query returns 600 if ran after Q1, Q2 and Q3
9
Example 2
In database management systems (DBMS), a transaction refers to a logical unit of work that is performed against a
database. It is a sequence of one or more database operations, such as reads or writes, that are executed as a
single, indivisible unit. The primary goal of a transaction is to ensure the consistency, integrity, and isolation of the
database.
11
ACID Properties
A transaction is atomic, meaning that it is treated as a single, indivisible unit of work. Either all the changes made by
the transaction are committed to the database, or none of them are. If any part of the transaction fails, the entire
transaction is rolled back to its original state.
Transactions bring the database from one consistent state to another. The integrity constraints
and business rules of the database are maintained, ensuring that the database remains in a valid
state throughout the transaction.
Transactions are isolated from each other, meaning that the intermediate state of a transaction is not
visible to other transactions until it is committed. This isolation prevents interference and conflicts
between concurrently executing transactions.
Once a transaction is committed, its effects on the database persist even in the event of a system
failure. The changes made by committed transactions are durable and survive system crashes or
restarts.
12
Databases
Concurrency Control
Chapter 17
18
Concurrency Control
19
Transaction Schedules
20
Transaction Schedules
Convention 1
• A schedule is a sequence of actions on T1 T2
data from one of more transactions BEGIN
R(A)
• Actions are: Begin, Read (R), Write (W), W(A)
Commit (C) and Abort (A) R(B)
21
Transaction Schedules
Convention 1
• A schedule is a sequence of actions on T1 T2
data from one of more transactions BEGIN
R(A)
• Actions are: Begin, Read (R), Write (W), W(A)
Commit (C) and Abort (A) R(B)
22
Serial Schedules
Example
• A serial schedule is a schedule where T1 T2
each transaction runs from beginning to BEGIN
end without any interleaving with other R(A)
transactions W(A)
23
Serial Schedules
Schedule 1 Schedule 2
• A serial schedule is a schedule where T1 T2 T1 T2
each transaction runs from beginning to BEGIN BEGIN
end without any interleaving with other R(A) R(A)
transactions W(A) W(A)
24
Serial Schedules
• A schedule S is serializable if S is equivalent to
some serial schedule.
• Consider the two transactions below: In summary, a serial schedule is a
straightforward, sequential execution of
• T1: transfer 100 dhs from A to B transactions, while a serializable schedule
• T2: Add 6% to both A and B allows for concurrency among transactions but
ensures that the final result is equivalent to
• Serial schedule 1 when T1 is followed by T2: some serial execution. Serializable schedules
• A = 1.06 * (A - 100) are crucial for balancing the need for
concurrency in database systems with the
• B = 1.06 * (B + 100) requirement for maintaining data consistency
and integrity.
• Serial schedule 2 when T2 is followed by T1:
• A = (1.06 * A) - 100
• B = (1.06 * B) + 100
• Different results but both valid serial schedules!
25
Conflicting Operations
• Challenging to determine if two schedules leave the DB in the same final
state
• Instead of strict equivalence check, adopt a more conservative one:
• No false positives, but possible false negatives
• Sacrifice concurrency for correctness
• Use notion of “conflicting” operations (read/write)
• Two operations conflict if they:
• Belong to different transactions
• Apply on the same object
• At least one of them is a write
• The order of non-conflicting operations does not impact the final state of the
DB
• Only focus on conflicting operations
28
Example
29
Example
30
Example
31
Example
32
Example
33
Example
34
Dependency Graph
• Dependency graph: Ti Tj
• One node per Xact;
• Edge from Ti to Tj if :
• An operation Oi of Ti conflicts with an operation Oj of Tj
• Oi appears earlier in the schedule than Oj
• Theorem: Schedule is conflict serializable if and only if its
dependency graph is acyclic
35
Example
• A schedule that is not conflict serializable:
A
T1 T2 Dependency graph
36
Example
• A schedule that is not conflict serializable:
A
T1 T2 Dependency graph
B
• The cycle in the graph reveals the problem. The output of T1
depends on T2, and vice-versa.
37
Example
• A schedule that is not conflict serializable:
A
T1 T2 Dependency graph
B
• The cycle in the graph reveals the problem. The output of T1
depends on T2, and vice-versa.
38
View Serializability
• Schedules S1 and S2 are view equivalent if:
• Same initial reads: If Ti reads initial value of A in S1, then Ti also reads initial
value of A in S2
• Same dependent reads: If Ti reads value of A written by Tj in S1, then Ti also
reads value of A written by Tj in S2
• Same winning writes: If Ti writes final value of A in S1, then Ti also writes final
value of A in S2
• View Serializability allows conflict serializable schedules + blind
writes
39
Summary
• Transactions are important:
• Concurrency
• Recovery
• Xact Properties
Summary
• Transactions are important:
• Concurrency
• Recovery
• Xact Properties
Approach #1: Pessimistic Concurrency Control Approach #2: Optimistic Concurrency Control
Avoid anomalies to happen Correct anomalies when they happen
Example: Locking Examples: timestamp, multi-version,
validation, snapshot isolation
Most common approach to enforce serializability in DB because it is best for environments with high
contention
Summary
• Types of Conflicts
• Read-Write Conflicts (“Unrepeatable Reads”)
T1: R(A), R(A), W(A), C
T2: R(A), W(A), C
Summary
Write-Read Conflict:
• Types of Conflicts Occurs when one transaction writes to a data item that is being read by another transaction simultaneously.
The reading transaction may not see the changes made by the writing transaction, leading to inconsistencies.
Write-Write Conflict:
Locking: Transactions acquire locks on data items, preventing other transactions from accessing or modifying the same data concurrently.
• Notions to guarantee Isolation Levels: Setting the isolation level of transactions determines the degree to which they are isolated from each other, minimizing
conflicts.
All Serializable Schedules
concurrency correctness: By addressing these conflicts, database systems aim to provide a balance between allowing concurrent execution for improved
performance and maintaining the consistency and integrity of the data.
+blind write
• Serializability View Serializable Schedules
• Conflict Serializability Conflict Serializable Schedules
• View Serializability Serial Schedules
45
Summary
• Types of Conflicts
• Read-Write Conflicts (“Unrepeatable
Reads”)
• Write-Read Conflicts (“Dirty Reads”)
• Write-Write conflict (“Lost Updates”)
• Notions to guarantee
All Serializable Schedules
concurrency correctness:
• Serializability View Serializable Schedules
• Conflict Serializability Conflict Serializable Schedules
• View Serializability Serial Schedules
47
Locking
• Two main types of locks:
• Shared Lock (S-LOCK): allows multiple transactions to read the same
object at the same time.
• Exclusive Lock (X-LOCK): allows a transaction to modify an object.
S-Lock X-Lock
Lock Compatibility Matrix S-Lock
X-Lock
• Any number of transactions can hold an S-LOCK on an object
• But, if a transaction holds an X-LOCK on an object, no other transaction
may hold any lock on this object
48
Two-Phase Locking (2PL)
• Two main types of locks: Shared Lock (S-LOCK),
Exclusive Lock (X-LOCK)
• Xact cannot get new locks after releasing any lock
49
Two-Phase Locking (2PL)
• The most common approach to enforce conflict serializability
• It can be proved that the transactions can be serialized in the
order of their lock points (i.e. the point where a transaction
acquired its final lock).
• Pessimistic Approach:
• Obtains locks to prevent conflicts
• Some locks may have been unnecessary
• Risk of deadlocks: a deadlock is a situation in which two or more
transactions are waiting for one another to give up locks.
• Risk of cascading aborts: abort of one transaction requires abort of
another
50
Two-Phase Locking (2PL)
• Example of a deadlock
51
Strict Two-Phase Locking (2PL)
52
Lock Management
• Lock and unlock requests are handled by the lock manager
• Lock table entry:
• Number of transactions currently holding a lock
• Type of lock held (shared or exclusive)
• Pointer to queue of lock requests
• Locking and unlocking have to be atomic operations
• Lock upgrade: transaction that holds a shared lock can be upgraded to hold
an exclusive lock
53
Deadlocks
54
Deadlock Avoidance
55
Deadlock Detection
56
Deadlock Detection (Continued)
Example:
T1 T2 T1 T2
T4 T3 T3 T3
57
Multiple-Granularity Locks
Attribute
58
Solution: New Lock Modes, Protocol
• Allow Xacts to lock at each level, but with a special protocol using
new “intention” locks:
60
Examples
61
Dynamic Databases
62
The Problem
• T1 implicitly assumes that it has locked the set of all sailor records
with rating = 1.
• Assumption only holds if no sailor records are added while T1 is
executing!
• Need some mechanism to enforce this assumption. (Index locking and
predicate locking.)
• Example shows that conflict serializability guarantees serializability
only if the set of objects is fixed!
• Phantom Problem!
63
The Phantom Problem
64
The Phantom Problem
65
Isolation Levels in SQL
• 1. “Dirty reads”
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED
• 2. “Committed reads”
SET TRANSACTION ISOLATION LEVEL READ COMMITTED
• 3. “Repeatable reads”
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ
• 4. Serializable transactions
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
66
Isolation Level: Dirty Reads
67
Isolation Level: Read Committed
68
Isolation Level: Repeatable Read
69
Isolation Level: Serializable
70
Summary (Contd.)
• Correctness criteria for isolation is “serializability”
• We use “conflict serializability”: more conservative but easier to enforce
• Two Phase Locking and Strict 2PL:
• Locks implement the notions of conflicts
• Locks are managed by the lock manager
• Deadlocks can happen: we can either prevent them or detect them
• Multiple granularity locking:
• Allows flexible tradeoff between the lock scope in DB and the number of lock entries in the
lock table
• Additional
• S2PL is pessimistic, there exists optimistic CC protolcols, e.g. Timestamp CC
• Index latching, phantoms, etc.
71