Generalized Isolation in Database
Yang Wang
Revision
• What does “Isolation” mean?
– How a transaction’s internal state/execution is
visible to other transactions
Revision
• Strong isolation: Serializability
– Concurrent execution of transactions is equivalent
to a serial execution.
• Weak isolation: No control at all
– We have learned it can incur a lot of problems.
Revision
• Strong isolation: Serializability
– Concurrent execution of transactions is equivalent
to a serial execution.
In practice, people define a series
of other isolation levels in the middle.
• Weak isolation: No control at all
– We have learned it can incur a lot of problems.
Common isolation levels
• (Strong)
• Strict Serializability
• Serializability
• Repeatable reads • Snapshot Isolation
• Read committed
• Read uncommitted
• No isolation at all
• (Weak)
Why so many isolation levels?
• They explore different trade-offs between
parallelism and ease of programming
• Stronger isolation: easier to program with
• Weaker isolation: higher parallelism
Outline
• Definition of different isolation levels
• Implementation of different isolation levels
Read Uncommitted
• Allow a transaction to read uncommitted data
from another transaction
– This phenomenon is called dirty reads.
• Common implementation: lock an object
before accessing it and unlock it right after
accessing it
• This is a very weak isolation level. Almost no
one uses it.
Read Committed
• A transaction reads only committed data.
– No dirty reads.
• Common implementation: for a write, hold lock
to the end of transaction; for a read, release lock
right after accessing the object
• Can you find an execution log that is permitted by
Read Committed but not by Serializable?
Read Committed
• T1: W1(a)
• T2: R2(a) R2(a)
• Execution log R2(a) W1(a) R2(a) is permitted by
Read Committed, but not by Serializable
Repeatable Read
• This is a controversial term in history.
• Non-repeatable read: a transaction reads an
object twice and gets different values
– See the previous example
• Definition 1: read committed + prevent non-
repeatable read
Repeatable Read
• Definition 1: read committed + prevent non-
repeatable read
• Can you show an execution log which is
permitted by Repeatable Read but not by
Serializable?
Repeatable Read
• T1: R1(a) W1(b)
• T2: R2(b) W2(a)
• Execution log R1(a) R2(b) W1(b) W2(a) is permitted by
Repeatable Read (assuming we use the Definition 1) but
not by Serializable.
• This phenomenon is called a write screw.
– From wiki: two transactions (T1 and T2) concurrently read an
overlapping data set (e.g. values V1 and V2), concurrently make
disjoint updates (e.g. T1 updates V1, T2 updates V2), and finally
concurrently commit, neither having seen the update
performed by the other.
Repeatable Read
• Definition 2: based on locking
– Hold both write lock and read lock to the end of
transaction
• You should find this is the same as strict two-
phase locking and thus if we use this
definition, Repeatable Read is very similar to
Serializability
Serializability
• Definition: Repeatable Read definition 2 +
support range operation
• What is a range operation?
– If there are only Read or Write in a transaction,
Repeatable Read definition 2 = Serializability.
– A SQL transaction can do a query on a range: e.g.
select name whose age < 65
– Locking individual objects cannot achieve
Serializability for such queries. Can you show an
example?
Phantom read
• T1: select name whose age < 60
• T2: insert user(name=‘A’, age=30); insert user(name=‘B’,
age=40)
• Serializability: T1 should see both A and B or neither
• However, if T1 simply locks all objects one by one, it
cannot achieve Serializability: it is possible T2 inserts A
first, then T1 scans all, then T2 inserts B.
• This phenomenon is called phantom read.
– From wiki: A phantom read occurs when, in the course of a
transaction, new rows are added by another transaction to
the records being read.
• Solution: range lock. We will not discuss details.
Strict Serializability
• Definition: Serializability + real time guarantee
• Real time guarantee: if T1 commits before T2
commits, then T1 must be serialized before T2
in the serialization graph
Snapshot Isolation
• Definition:
– A transaction appears to operate on a personal
snapshot of the database, taken at the start of the
transaction.
– Will commit only if its updated objects are not
changed by other transactions.
• Can you find one execution log that is
permitted by Snapshot Isolation but not by
Serializable?
Snapshot Isolation
• T1: R1(a) W1(b)
• T2: R2(b) W2(a)
• Execution log R1(a) R2(b) W1(b) W2(a) is
permitted by Snapshot Isolation but not by
Serializable.
• Snapshot Isolation does not prevent write
screw.
• Can it prevent phantom read? Yes.
How to define isolation levels?
• There has been a long discussion in history.
• Earliest version (ANSI/ISO SQL-92):
Dirty Non-repeatable Write Phantom
read read screw read
Read uncommitted Yes Yes Yes Yes
Read committed No Yes Yes Yes
Repeatable read 1 No No Yes Yes
Repeatable read 2 No No No Yes
Snapshot Isolation No No Yes No
Serializable No No No No
How to define isolation levels?
• In general, using natural language as definition
is not accurate
– What does “write skew” mean for 3 transactions?
• Hal Berenson, Phil Bernstein, Jim Gray, Jim
Melton, Elizabeth O’Neail, Patrick O’Neil, A
critique of ANSI SQL isolation levels. SigMOD
95.
Jim Gray
• Defined ACID
• Founder of relational DB
• Turing Award 1998: For seminal
contributions to database and transaction
processing research and technical
leadership in system implementation.
• Disappeared in 2007.
How to define isolation levels?
• Second edition: based on locks
– Read uncommitted: lock before access and unlock
after access
– Read committed: hold write lock to end; release
read lock after access
– Repeatable Read: hold both write lock and read
lock to end
– Serializable: add range lock to Repeatable Read
How to define isolation levels?
• Problem of lock-base definition:
– There are implementations not based on locks.
– E.g. timestamp based.
– In general, it is also bad to define a concept based
on implementation. It is too restrictive.
• Third edition: Atul Adya, Barbara Liskov,
Patrick O’ Neil, Generalized Isolation Level
Definitions, ICDE 2000
Barbara Liskov
• One of the first women in US to get a
Ph.D. in CS (Stanford).
• 2008 Turing Award: for her work in the
design of programming languages and
software methodology that led to the
development of object-oriented
programming.
• So Database is not her major field, but
…
How to define isolation levels?
• Third edition: based on serialization graphs
• Three kinds of edges:
– Write dependency: write-write conflict
– Read dependency: write-read conflict
– Anti dependency: read-write conflict
• PL1: no cycles consisting entirely of write dependency
edges.
• PL2 (read committed): no cycles consisting entirely of write
and/or read dependency edges.
• PL3 (serializable): no cycles consisting of any edges. (This is
what we learned in class)
• Can be extended to other isolations like snapshot isolation
How to define isolation levels?
• Problem of graph based definitions: not very
intuitive for developers to understand
• Fourth edition: define isolation from the
client’s point of view
– “Seeing Is Believing: A Client-Centric Specification
of Database Isolation”, Natacha Crooks, Youer Pu,
Lorenzo Alvisi, and Allen Clement, PODC 2017
Formal definitions are the foundations of
any discussions in many cases.
Getting accurate definitions is not easy.
Outline
• Definition of different isolation levels
• Implementation of different isolation levels
Lock based implementation
• We have learned this in class
• Read uncommitted: release lock after
accessing the object
• Read committed: hold write lock to end of
transaction; release read lock after access
• Repeatable read: hold both locks to end of
transaction
• Serializability: Repeatable Read + Range lock
Optimistic Concurrency Control (OCC)
• Lock-based implementations are often called
pessimistic solutions
– They try to prevent bad interleaving from
happening.
• Key idea of OCC: execute transactions without
concurrency control, and validate whether
their execution is appropriate before commit.
Implementation of OCC
• Begin: record a timestamp for transaction
• Execute: read from DB and write to a new
copy
• Validate: check whether this transaction
conflicts with other transactions, depending
on isolation level
• Commit/rollback: if no conflict, commit
(merge the new copy into DB and update
timestamp); otherwise, rollback and retry
Implementation of OCC
• How to validate?
• For Snapshot Isolation: before T1 commits,
validate whether an object updated by T1 was
modified by others after T1 starts.
• For Serializability?
Implementation of OCC
• How to validate?
• For Serializability
– Option 1: Could build a serialization graph and
check, but this is usually computational heavy
– Option 2: Before T1 commits, validate whether an
object used (read or written) by T1 was modified
by others after T1 starts.
Implementation of OCC
• How to validate?
• For Serializability
– Option 2: Before T1 commits, validate whether an
object used (read or written) by T1 was modified
by others after T1 starts.
– Option 2 is correct, but is more restrictive than
Serializability. Can you find an example?
Implementation of OCC
• T1: R1(a) W1(b) Commit1
• T2: R2(b) W2(c) Commit2
• Execution log R1(a) R2(b) W1(b) Commit1 W2(c)
Commit2
• Commit2 will cause abort because b was
modified by T1, but it is unnecessary.
Locking vs OCC
• OCC: no locking when executing, abort when
detecting conflict
• Locking: locking when executing, no parallelism
when there is conflict
• Abort usually has a high overhead
• If your workload has no or little conflict, OCC is
better because it does not have the overhead of
locking.
• If your workload has a lot of conflicts, locking is
better, because OCC has many aborts.