1.2 Concurrency control implementation

This section covers the basic implementation techniques based on locks, namely (strict) two-phase locking, describes the properties satisfied by the (strict) 2PL protocol, and discusses SQL support for serializability.

Locking protocols

A lock is a value associated with each database object.

The basic locking protocol is two-phase locking (2PL): If a transaction wants to read (resp., write) an object, the scheduler first requests a shared/read (resp., exclusive/write) lock for the object from a lock manager. If a requested lock is not granted, the transaction is suspended until a lock is granted by the lock manager. All locks obtained by a transaction must be released before the transaction completes. In 2PL, all locks must be obtained before any locks are released.

Shared locks allow other transactions to read but not write the object; exclusive locks prevent other transactions from even reading the object.

Two-phase locking ensures schedules are serialisable.

Two-phase locking does not ensure schedules are strict.

Example 3 (U1, Ex. 9.18)

L1(A) R1(A) A+=100_1 W1(A) U1(A) L2(A) R2(A) L1(B) R1(B) A*=1.1_2 W2(A) U2(A) C2 B/=A_1 A1

Now, we must remove T1's lock on B, restore T1's initial value of A, rerun T2 (because it read an incorrect value of A) despite having committed, and we may need to rerun other transactions which read A after T2 wrote it.

The locking protocol used in practice is strict two-phase locking (strict 2PL): If a transaction wants to read (resp., write) an object, it first requests a shared (resp., exclusive) lock for the object. All locks obtained by a transaction must be held until the transaction commits or aborts. (The exact definition of when shared and exclusive locks may be released is subtle.)

Strict 2PL ensures schedules are serialisable and strict (doh).

Dynamic databases

When the set of database objects can change, the situation is more complex.

Suppose transaction T1 reads the Edinburgh accounts (Ai) in a bank and compares their sum with the stored total (T). Suppose transaction T2 inserts a new Edinburgh account (A) and updates the total.

R1(A1) R1(A2) R1(A3) I2(A) R2(T) W2(T) R1(T)

In this schedule, the total read by T1 is not the sum of the accounts it read, despite the fact that this is a 2PL schedule (even a strict 2PL schedule). This is called the phantom problem. To some extent it arises because we ignored the ``end of sequence'' item which transactions T1 and T2 both accessed. One solution is to use locks on the indexto the accounts in the bank, but this seems too implementation-oriented. Another solution is to use predicate locking, locking the set of all Edinburgh accounts, but this is expensive.

Other operations

Operations such as increment and decrement cause additional problems.

Deadlocks

Deadlocks may occur even with strict 2PL. (Conservative 2PL collect all required locks at once - can avoid this but reduces concurrency.) The DMBS needs to periodically check whether deadlocks have occurred and somehow break them.

One way to identify whether a deadlock has occurred is to maintain a waits-for graph. This graph has a node for each active transaction (in a schedule) and an arc from Ti to Tj iff T1 is waiting for Tj to release a lock. The lock managemer adds edges to this graph when it suspends transactions and removes edges when it grants lock requests. A schedule is deadlocked if the waits-for graph contains a cycle. The deadlock can be broken by aborting one of the transactions in the schedule (and later redoing it).

Transaction support in SQL

Transactions start with a SELECT, UPDATE, CREATE TABLE, and other similar statements. Transactions end with a COMMIT or ROLLBACK (abort) statement.

SQL allows programmers to especify three characteristics of a transaction: access mode, diagnostics size and isolation level. Access mode is READ ONLY, or READ WRITE. Diagnostics size only affects error reporting. Isoliation level is the important one which affects the extent to which transactions may be executed concurrently. The four possible isolation levels are (R&G, Fig. 16.10):

LevelDirty readUnrepeatable readPhantom
READ UNCOMMITTEDMaybeMaybeMaybe
READ COMMITTEDNoMaybeMaybe
REPEATABLE READNoNoMaybe
SERIALIZABLENoNoNo

The SERIALIZABLE level is the safest to use; it guarantees strict, serialisable schedules using predicate locking to avoid the phantom problem.

The isolation level and access mode can be set using the SET TRANSACTION command, e.g.,

SET TRANSACTION ISOLATION LEVEL SERIALIZABLE READ ONLY

References

See Bernstein et al., Sections 3.1 to 3.7, for details.