IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. SE-11, NO. 1, JANUARY 1985
[24]
[251
[26]
[271
[281
[29]
[30]
ing for local computer networks," Commun. ACM, vol. 19, pp.
395-404, July 1976.
D. Menasce and R. Muntz, "Locking and deadlock detection in
distributed databases," IEEE Trans. Software Eng., vol. SE-5,
pp. 195-202, May 1979.
J. McQuillan and D. Walden, "The ARPA network design decisions," Comput. Networks, vol. 1, pp. 243-289, Aug. 1977.
B. Nelson, "Remote procedure call," Dep. Comput. Sci., CarnegieMellon Univ., Pittsburgh, PA, Tech. Rep. CMU-CS-81-119, May
1981.
R. Thomas, "A solution to the concurrency control problem for
multiple copy databases," in Proc. IEEE Compon '78, 1978, pp.
56-62.
H. Sturgis, J. Mitchell, and J. Israel, "Issues in the design and use
of a distributed file system," Oper. Syst. Rev., vol. 14, pp. 55-69,
July 1980.
R. Smith, "The contract net protocol," in Proc. 1st Conf Distributed Computing Systems, 1979, pp. 185-191.
R. Strom and S. Yemini, "NIL: An integrated language and system for distributed programming," in Proc. SIGPLAN '83 Symp.
Programming Language Issues in Software Systems, 1983, pp.
73-82.
67
Mustaque Ahamad received the B.E.(Hons.) degree in electTical engineering from Birla Institute of Technology and Science, Pilani, India,
in 1981.
He is currently working toward the Ph.D. degree in computer science
at the State University of New York at Stony Brook. His research interests include distributed programming languages, operating systems,
network protocols, and distributed algorithms.
Arthur J. Bernstein (S'56-M'63-SM'78-F'81) received the Ph.D. degree
from Columbia University, New York, NY.
He is on the faculty of the Computer Science Department at the State
University of New York at Stony Brook. His current research interests
are in the area of distributed algorithms, concurrent programming, and
networks.
Dr. Bernstein was a member of the IEEE Distinguished Visitors
Program.
A Priority Based Distributed Deadlock Detection
Algorithm
MUKUL K. SINHA
AND
N. NATARAJAN
Abstract-Deadlock handling is an important component of transacI. INTRODUCTION
tion management in a database system. In this paper, we contribute to
the development of techniques for transaction management by present- IN a database system, accesses to data items by concurrent
ing an algorithm for detecting deadlocks in a distributed database systransactions must be synchronized to preserve the consistem. The algorithm uses priorities for transactions to minimize the tency of the database. Locking is the most common mechanumber of messages initiated for detecting deadlocks. It does not con- nism used for access synchronization. When locking is used, a
struct any wait-for graph but detects cycles by an edge-chasing method.
It does not detect any phantom deadlock (in the absence of failures), group of transactions (two or more) may sometimes get inand for the resolution of deadlocks it does not need any extra computa- volved in a deadlock [5]: this is a situation in which each memtion. The algorithm also incorporates a post-resolution computation ber of the group waits (indefinitely) for a data item locked by
that leaves information characterizing dependence relations of remain- some member transaction of the group. Deadlocks can be reing transactions of the deadlock cycle in the system, and this will help
in detecting and resolving deadlocks which may arise in the future. An solved by aborting at least one of the transactions involved. A
interesting aspect of this algorithm is that it is possible to compute the simple scheme that can be used to break a deadlock is to use
exact number of messages generated for a given deadlock configuration. timeouts and abort transactions when they have waited for
The complexity is comparable to the best algorithm reported. We fi'rst more than a specified time interval after issuing a lock request.
present a basic algorithm and then extend it to take into account shared Alternatively, a deadlock can be detected using a specific aland exclusive lock modes, simultaneous acquisition of multiple locks,
gorithm for this purpose and resolved by aborting at least one
and nested transactions.
of the transactions involved in the deadlock.
Index Terms-Deadlock, deadlock detection, distributed database,
Using timeouts to handle deadlocks is only a brute force
nested transaction, priority, timestamp, transaction.
technique. Since in practice, it is very difficult to choose a
Manuscript received February 25, 1984; August 28, 1984.
The authors are with the National Centre for Software Development
and Computing Techniques, Tata Institute of Fundamental Research,
Bombay 400 005, India.
proper timeout interval, this technique may result in unnlecessary transaction aborts. Another major drawback of this
scheme is that it cannot avoid cyclic restarts [16] ; i.e., a transaction may repeatedly be aborted and restarted. In contrast
0098-5589/85/0100-0067$01.00
© 1985 IEEE
68
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. SE-1l , NO. 1, JANUARY 1985
to the timeout technique, a deadlock detection scheme aborts
a transaction only when the transaction is involved in a deadlock. Most deadlock detection schemes [81, [9], [12], [15]
detect deadlocks by finding cycles in a transaction wait-for
graph, in which each node represents a transaction, and a directed edge from one transaction to another indicates that the
former is waiting for a data item locked by the latter transaction. In a distributed database system, the problem is, in essence, of finding cycles in a distributed graph where no single
site knows the entire graph.
The deadlock detection scheme presented in this paper does
not construct any transaction wait-for graph, but follows the
edges of the graph to search for a cycle (called an edge-chasing
algorithm by Moss [131). It is assumed that each transaction
is assigned a priority in such a way that priorities of all transactions are totally ordered. When a transaction waits for a data
item locked by a lower priority transaction, we say that an
antagonistic conflict has occurred. When an antagonistic conflict occurs for a data item, the waiting transaction initiates a
message to find cycles of transactions, in which each transaction is waiting for a data item locked by the next. If the message comes back to the initiating transaction, a deadlock cycle
is detected.
Our algorithm presumes a point-to-point network with a reliable message communication facility, and it is not applicable
for detecting communication deadlocks [4], [141.
The distinguishing features of the proposed deadlock detection scheme are as follows.
1) For a given deadlock cycle, it is possible to compute the
exact number of messages that have been generated for the purpose of deadlock detection. If the number of messages generated is used as a complexity measure, the proposed algorithm
is not inferior to any of the other algorithms reported in the
explicitly [3], [13], [14]. In comparison to the algorithm
of Chandy and Misra [3], our algorithm has the following
advantages.
1) In our scheme, a deadlock computation is initiated only
when an antagonistic conflict occurs. In contrast, in their
scheme, a computation is initiated whenever a transaction begins to wait for another. Hence, our algorithm generates a
fewer number of messages to detect a deadlock.
2) In our scheme, there is no separate phase for deadlock
resolution.
Our scheme has some similarities (e.g., initiation of deadlock
computation only when an antagonistic conflict occurs) with
the algorithm proposed by Moss [13]. However, in comparison
to his scheme, our algorithm has the following advantages.
1) In Moss' scheme, a transaction does not maintain any information regarding transactions that wait for it, directly or indirectly. Hence, his scheme requires transactions to initiate
deadlock detection computations periodically. Thus, his
scheme would, in general, require more messages and it is not
possible to compute the exact number of messages generated
before a deadlock is detected.
2) In our scheme, a transaction continues to retain the above
information even after the resolution of a deadlock, and this in
turn speeds up detection and resolution of future deadlocks.
3) Our algorithm is less prone to detect phantom deadlocks
that may involve nested transactions than Moss' scheme. In
our scheme, a detected deadlock is made phantom only when
a waiting transaction aborts, either explicitly or implicitly. In
contrast, in Moss' scheme, sometimes a detected deadlock is
made phantom even when an active transaction aborts, say due
to some application considerations. We discuss this further in
Section VI-C
4) In our scheme all messages have an identical short length
whereas Moss' scheme has messages of varying lengths.
In the following section, we introduce a distributed database
model in order to set the context, and in Section III we describe the basic distributed deadlock detection algorithm. We
analyze the cost of the algorithm in Section IV. The basic algorithm is applicable when only exclusive locks are used. However, it has been reported in the literature [9] that 80 percent
of access is only for reading data. Taking this into account, we
show in Section V how the basic algorithm can be modified to
include share locks as well as simultaneous acquisition of multiple locks. In Section VI, we describe a nested transaction
model and extend the algorithm to detect and resolve deadlocks taking into account nested transactions. We conclude
the paper with suggestions for further improving the algorithm.
literature.
2) When a deadlock is detected, the detector has information about the highest and the lowest priority transactions of
the cycle, and this can be used for deadlock resolution. Thus,
resolution does not need any new computation.
3) In the absence of failures (site failures or explicit abort of
a waiting transaction by the user), it does not detect any phantom deadlock.
4) Even after a transaction is aborted to resolve a deadlock,
other member transactions of the cycle continue to retain information about the remaining transactions. This, in turn,
helps to detect, with fewer number of messages, deadlocks in
which the remaining transactions (or any subset of them) may
get involved in the future.
5) The resolution scheme adopted guarantees progress of
II. THE DISTRIBUTED DATABASE MODEL
computation, and avoids the problem of cyclic restart.
6) The basic algorithm can be easily extended to a locking A database is a structured collection of information. In a
scheme that provides both share locks and exclusive locks, and distributed database system, the information is spread across a
the scheme in which a transaction can acquire several locks collection of nodes (or sites) interconnected through a communication network. Each node has a system-wide, unique
simultaneously.
7) It can also be extended to detect and resolve deadlocks identifi'er, called the site-identification-number (site id, in
which may occur in an environment where transactions can be short), and nodes communicate through messages.
All messages sent arrive at their destinations in finite time,
nested within other transactions.
In the literature, several authors have proposed algorithms for and the network filters duplicate messages and guarantees that
deadlock detection in which wait-for graph is not constructed messages are error-free. The site-to-site communication is
SINHA AND NATARAJAN: DISTRIBUTED DEADLOCK DETECTION ALGORITHM
pipelined, i.e., the receiving site gets messages in the same
order that the sending site has transmitted them.
Within a node, there are several processes and data items (or
objects). A process is an autonomous active entity that is
scheduled for execution. Every process has a system-wide
unique name, called process-id, and processes communicate
with each other through messages. To access one or more data
items, which may be distributed over several nodes, a user
creates a transaction process at the local node. The transaction process coordinates actions on all data items participating
in the transaction and preserves the consistency of the database. Henceforth, we use the term transaction to denote the
corresponding transaction process.
Data items are passive entities that represent some independently accessible piece of information. Each data item is maintained by a data manager which has the exclusive right to operate on a data item. If a transaction wants to operate on a data
item, it must send a request to the data manager that manages
the data item. A data manager can maintain several data items
simultaneously. However, to simplify the exposition, we shall
assume that a data- manager maintains only one data item.
In addition to data manipulation operations, a data manager
provides two primitives to control access to the data item that
it maintains: Lock(data_item) and Un_Lock(data_item). A
transaction must lock a data item before accessing it, and it
must unlock the data item when it no longer needs to access it.
A data item can be in one of two lock modes, null or free (N,
i.e., absence of a lock), and exclusive (X, i.e., presence of a
lock). A data manager honors the lock request of a transaction
if the data item is free; otherwise it keeps the lock request
pending in a queue, called request_Q. Atransactionwhichhas
locked the data item is called the holder of the data, whereas a
transaction which is waiting in the request_Q is called a requester of the data item. When a holder unlocks the data item,
the data manager chooses a lock request from the request_Q,
and grants the lock to that requester. The scheduling scheme
followed by the data manager does not guarantee avoidance of
deadlocks [5], e.g., it may follow an arrival order scheduling
scheme.
Transactions can be in one of two states: active or wait. If a
transaction waits in a request_Q of a data manager, it is in
wait state, otherwise it is an active state. An active transaction
process may or may not be running on a processor. The state
of a transaction changes from active to wait when its lock request for a data item is queued by the data manager in its request-Q. The state of the transaction changes from wait to
active when the data manager schedules its pending lock request. In either case, the manager informs the transaction of
its change of state. We assume that a transaction acquires locks
one after another (i.e., at any time it has only one outstanding
lock request), and it follows the two-phase lock protocol [7].
Each transaction is assigned a priority in such a way that priorities of all transactions are totally ordered. To assign priorities to transactions, we use the timestamp mechanism. When
a transaction is initiated, it is assigned a unique timestamp.
Timestamps induce priorities in the following manner: a transaction is of higher priority than another if the timestamp of
the former is less than that of the latter. Unlike the timestamp
69
synchronization scheme [2] which uses timestamps to schedule lock requests of transactions (and in turn, prevents deadlocks), here timestamps are used only to assign priorities to
transactions.
For generating timestamps, we assume that every node has a
logical clock (or counter) that is monotonically increasing, and
the various clocks are loosely synchronized [111. A timestamp
generated by a node i is a pair (C, i) where C is the current
value of the local clock and i is the site-id of the node i.
Greater than (>) and less than (<) relations for timestamps
are defined as follows.
Let t, = (Cl, il) and t2 = (C2, i2) be two timestamps. Then
t1 >t2 iffCl >C2or(Cl =C2andil >i2);
t1 < t2 iff Cl <C2 or (Cl = C2 and il < i2).
Each transaction is denoted by an ordered pair of the form
(p, t)where p is the process-id of the corresponding transaction
process, and t is the timestamp of the transaction. The process-id is used for communication purposes.
If two transactions T1 and T2 are denoted by the pairs
(Pi, t1) and (P2, t2), respectively, we say that Tl > T2, i.e.,
priority of T1 is higher than that of T2, if t, < t2Further, we say that there is an antagonistic conflict at a data
item if the item is locked, and there is a requester of higher
priority than the holder. In such a case, we also say that the
requesterfaces the antagonistic conflict.
III. DISTRUBUTED DEADLOCK DETECTION
AND RESOLUTION
In this algorithm, a deadlock is detected by circulating a
message, called probe, through the deadlock cycle. The occurrence of an antagonistic conflict at a data site triggers initiation of a probe. A probe is an ordered pair (initiator, junior),
where initiator denotes the requester which faced the antagonistic conflict, triggering the deadlock detection computation,
and initiating this probe. The element junior denotes the
transaction whose priority is the least among transactions
through which the probe has traversed.
A data manager sends a probe only to the holder of its data
while a transaction process sends a probe only to the data
manager from which it is waiting to receive the lock grant.
Transaction processes (or data managers) never communicate
among themselves for purposes of deadlock detection.
A. The Basic Deadlock Detection Algorithm
The basic deadlock detection algorithm has three steps.
1) A data manager initiates a probe in the following two
situations.
a) When the data item is locked by a transaction, if a lock
request arrives from another transaction, and requester >
holder, the data manager initiates a probe and sends it to the
holder.
b) When a holder releases the data item, the data manager
schedules a waiting lock request. If there are more lock requests still in the request_Q, then for each lock request for
which requester> new holder, the data manager initiates a
probe and sends it to the new holder.
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. SE-11, NO. 1, JANUARY 1985
70
When a data manager initiates a probe it sets
initiator: = requester;
junior := holder;
We shall presently assume that a data manager sends a probe
as soon as the above situations occur. However, as we shall
elaborate in Section VII, in order to improve performance, a
data manager can wait for a while before sending a probe.
2) Each transaction maintains a queue, called probe_Q,
where it stores all probes received by it. The probe_Q of a
transaction contains information about the transactions which
wait for it, directly or transitively. Since we have assumed that
a transaction follows the two phase lock protocol, the information contained in the probe_Q of a transaction remains valid
until it aborts or commits.
After a transaction enters the second phase of the two phase
lock protocol, it can never get involved in a deadlock. Hence,
when it enters the second phase, it discards the probe-Q. During the second phase, any probe or clean message (discussed
later in this section) received is ignored.
A transaction sends a probe to the data manager, where it is
waiting in the following two cases.
a) When a transaction T receives probe(initiator, junior),
it performs the following.
if junior > T
then junior := T;
save the probe in the probe-Q;
if T is in wait state
then transmit a copy of the saved probe to the data manager
where it is waiting;
b) When a transaction issues a lock request to a data manager and waits for the lock to be granted (i.e., it goes from active to wait state), it transmits a copy of each probe stored in
its probe_Q to that data manager.
3) When a data manager receives probe(initiator, junior)
from one of its requesters, it performs the following.
if holder > initiator
then discard the probe
else if holder < initiator
then propagate the probe to the holder
else declare deadlock and initiate deadlock resolution;
When a deadlock is detected, the detecting data manager has
the identities of two members of the deadlock cycle, initiator
and junior, i.e., the highest and the lowest priority transactions, respectively. In order to guarantee progress, we choose
to abort junior, i.e., the lowest priority transaction (hereafter
called victim). When victim restarts, its priority does not
change, i.e., it uses the same timestamp that was assigned to it
when it was initiated.
B. The Deadlock Resolution and Post-Resolution
Computation
This consists of the following three steps.
1) To abort the victim, the data manager that detects the
deadlock sends an abort signal to the victim. The identity
of the initiator is also sent along with the abort signal: abort
(victim, initiator). Since victim is aborted, it is necessary to
discard those probes (from probe-Qs of various transactions)
that have victim as their juinor or initiator. Hence, on receiving an abort-signal, the victim does the following.
a) It initiates a message, clean(victim, initiator), sends it
to the data manager where it is waiting, and enters the abort
phase. Since initiator is the highest priority transaction of
the deadlock cycle, its probe_Q will never contain any probe
generated by other members of the cycle. Consequently,
probe_Qs of transactions, from initiator to victim in the direction of probe traversal, will not contain a probe having victim
either as junior or as initiator. And hence, the clean message
carries the identity of initiator beyond which it need not
traverse.
b) In abort phase, the victim releases all locks it held,
withdraws its pending lock request, and aborts. During this
phase, it discards any probe or clean message that it receives.
2) When a data manager receives clean(victim, initiator)
message, it propagates the message to its holder.
3) When a transaction T receives clean(victim, initiator)
message, it acts as follows.
purge from the probe_Q every probe that has victim as its
junior or initiator;
if Tis in wait state
then if T = initiator
then discard the clean message
else propagate the clean message to the data manager
where it is waiting
else discard the clean message;
A transaction discards a clean message in the following two
situations: 1) the transaction is in active state or, 2) the transaction is same as the initiator of the clean message received.
After "cleaning" up its probe_Q as described above, each
member transaction of the deadlock cycle continues to retain
the remaining probes in its probeQ. In the future, if the remaining members (or any subset of them) get involved in a
deadlock cycle, it will be detected with fewer number of messages, since probes have already traversed some edges of the
cycle.
IV. THE COST OF DEADLOCK DETECTION
To compare our algorithm to other deadlock detection and
resolution algorithms, we consider three factors which determine the cost of any deadlock detection algorithm:
1) Communication Cost: the number of messages that must
be exchanged to detect a deadlock;
2) Delay: the time needed to detect a deadlock once the
deadlock cycle is formed (presuming that every message exchange, whether it is an intersite communication or an intrasite communication, takes equal time); and
3) Storage Cost: the amount of storage needed by transactions and data managers specifically for purposes of deadlock
detection and resolution.
In our scheme, the communication and the delay costs of detecting a deadlock depends on the configuration of a deadlock
cycle. The configuration indicates which transaction waits for
which other transaction. We describe a configuration using a
SINHA AND NATARAJAN: DISTRIBUTED DEADLOCK DETECTIONI ALGORITHM
Ti
T.
TN
1.
TN-l
Ob,-
Obj2
Fig.
71
TN-2
O@
T5
b
T4
T3
O- bi
ob
T2
*j
Ti
O
.bj
An edge of a TWFG.
transaction wait-for graph (TWFG) [101 with the following
convention.
In a TWFG, nodes and edges are associated with transactions
and data items, respectively. The d;irection of an edge from
one transaction to another indicates that the former is waiting
for the latter.
For example, Fig. 1 indicates a conflict where the data item
Obj, is locked by a transaction Ti and the transaction TJ is
waiting to acquire the lock. We shall call the data manager of
Obij as D. If T > Ti, the conflict is antagonistic and the
data manager D, will initiate a deadlock detection computation by initiating probe(T1, T1), and sending it to the transaction TI.
A data item can have many requesters but only one holder,
and hence, in a TWFG, a node can have several incoming edges
but at most one outgoing edge.
A. The Communication Cost
We analyze the communication cost of our algorithm by
considering three kinds of configurations of a deadlock cycle.
The order of priority among transactions is assumed as follows:
Ti> Tjifi<j.
'The Best Configuration: For our algorithm, the best deadlock configuration, i.e., the configuration for which the deadlock is detected with minimum number of messages, is the one
in which only one edge of the cycle causes an antagonistic
conflict.
For example, consider the configuration illustrated in Fig. 2.
Except at the site of ObhN where T1 waits for TN and T1 >
TN, there is no antagonistic conflict at any other site. The
data manager DN initiates probe(T1, TN) and sends it to the
transaction TN. On receiving the probe, TN stores it in its
probe_Q, -and propagates it to DN-1. In two steps, a probe
travels from one data manager to the next data manager of the
TWFG.
On receiving probe(Tl, TN), the data manager DN-1 compares its holder TN-1 to the initiator T1 of the probe. Since
T1 > TN- 1, it propagates probe(T1, TN) to its holder, i.e.,
TN.-. The transaction TN-I, in turn, stores probe(Ti, TN)
in its probe_Q, and propagates it to the data manager DN-2,
and so on.
When the data manager D, finally receives probe(T1, TN)
from the requester T2, it finds that its holder is same as the
initiator of the probe, and hence, it detects the deadlock. In
this case, the total number of messages generated is 2 * (N - 1).
An Intermediate Configuration: Consider the deadlock configuration of Fig. 3. In comparison to the previous configuration, the positions of T2 and T3 are swapped at Obj2 site.
Thus, apart from the data item ObiN, the cycle has one more
antagonistic conflict at data item Obj2. Similar to DN, the
data manager D2 also initiates probe(T2, T3), and sends it to
the transaction T3. T3 stores it in its probe_Q, and since it
has an outstanding lock request for data item Objl, it propa-
ObjN
Fig. 2. Deadlock cycle: best configuration.
Tw
TN-1
\
bjN-1
TN-2
T5
ObjN_2
T4
Qbj4
x
T2
T3
T1
Obj,
Obi,
Obj 3
N
°~~~~~bj
Fig. 3. Deadlock cycle: intermediate configuration.
T2
T3
T4
\Ni_lOi-1
TN-3
TN-2
Obj4
.
TN-1
..Obj3
TN
Obj2
Ti
Ob'l
ObiN
Fig. 4. Deadlock cycle: worst configuration.
gates the probe to DI. When the data manager D1 receives
probe(T2, T3), it discards it since initiator < holder (i.e., T2
T1). Hence, the probe initiated at Obj2 site dies after two
steps.
As in the previous configuration, in this configuration as well,
the deadlock will be detected only when the probe initiated by
the data manager, DN traverses through the entire cycle, and
eventually reaches D1 after 2 * (N - 1) steps. Hence, in this
case, the total number of messages generated is 2 * (N - 1) + 2.
The Worst Configuration: By induction, we can infer that
the worst deadlock configuration, i.e., which will generate the
maximum number of messages before the deadlock is detected,
is the one in which each edge of the cycle except one causes an
antagonistic conflict.
For example, consider Fig. 4, in which there are (N - 1)
antagonistic conflicts. All data managers, except D1, initiate
a probe. All probes traverse up to the data manager D1 and
terminate, except the probe initiated by DN which leads to
the detection of a deadlock. Hence, the total number of messages generated will be
2 *(N- 1)+2 *(N- 2)+2 *(N- 3)+*
=N*(N- 1).
+2
In general, for a deadlock cycle of length N there are (N - 1)!
possible deadlock configurations. For a specific deadlock configuration, the total number of messages generated will be
2 *(N - 1) + CN_2 * 2 * (N - 2) + CN_3 * 2 * (N - 3)
+*'*+C2 *2.
where CI is 1, if an antagonistic conflict exists' at data item
Obj1, and 0 otherwise. For the above expression, the maximum and minimum values are N * (N - 1) and 2 * (N - 1),
72
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL.
SE-ll,
NO. 1, JANUARY 1985
respectively. For N= 2, the maximum and the minimum are rently in the system is N, then the length of a probe_Q can
grow at most up to (N - 1).
identical, namely 2.
B. The Delay
The delay is defined to be the time taken to detect the deadlock after the deadlock cycle is formed. Note that irrespective
of the configuration of a deadlock cycle of length N (best,
worst, or any intermediate), the maximum amount of delay is
the time taken to exchange 2 * (N - 1) messages. The delay is
maximum if the highest priority transaction of the cycle is the
last transaction to enter the wait state, closing the deadlock
cycle. If a transaction other than the highest priority transaction is the last to enter the wait state, the delay is less. This
is because the probe initiated by the highest priority transaction would have traversed part of the cycle before the cycle is
formed.
Suppose, in the configuration shown in Fig. 2 (prior to the formation of a deadlock cycle), all edges except the edge TJ+ 1-TJ
(where I < J <N - 1) are formed, i.e., TJ+ I is still active.
When TJ+ 1 requests for a lock on data item Objj held by TJ,
it enters the wait state closing the deadlock cycle.
Case 1: If probe(T1, TN), initiated due to the antagonistic
conflict T1 TN, has reached the transaction TJ+1 before it
entered the wait state, the delay to detect the deadlock will be
equal to the time taken to exchange (2 * J - 1) messages.
Case 2: If probe(TI, TN) is yet to reach the transaction TN,
i.e., transactions T, and TJ+ 1 entered the wait state in a quick
succession (closing the deadlock cycle), and the time gap was
too small compared to the time taken to exchange one message. In this case, the delay to detect the deadlock will be
equal to the time taken to exchange 2 * (N - 1) messages.
Hence, if a deadlock cycle is closed by transaction Tj+ 1, then
the time taken to detect the deadlock will be any where between (2 * J - 1) to 2 * (N - 1), for J = I
(N- 1).
For the configuration given in Fig. 2, the delay will be minimum (i.e., the time taken to exchange one message) if 1) the
cycle is closed by transaction T2 by waiting for Tl, the highest
priority transaction of the cycle, and 2) the probe initiated
due to the .antagonistic conflict T1_ TN must have reached T2
before the latter entered the wait phase.
From this result we can generalize that for any configuration
the minimum time taken to detect a deadlock is the time taken
for exchange of one message, and this can happen only when
1) the cycle is closed by a transaction waiting for the highest
priority transaction of the configuration, and 2) the probe
initiated by the highest priority transaction had reached the
cycle-closing transaction before the latter entered the wait
D. Costwise Comparison to Other Algorithms
In comparison to the algorithm of Chandy and Mishra [3],
our algorithm has less communication cost since it initiates a
deadlock computation only upon the occurrence of antagonistic
conflicts, but not otherwise. Furthermore, the resolution of
deadlock does not involve any extra cost.
Unlike Moss' algorithm [13], we have separated the cost of
reliable network communication from that of deadlock detection. Incorporation of this distinction in our algorithm enables
us to compute exact communication and delay costs of deadlock detection, for a given configuration.
In the distributed database model considered by Obermarck
[15], transactions migrate from one data site to another, and
there is a deadlock detector at each site which builds a transaction wait-for graph for that site (by extracting information
from lock tables, and other resource allocation tables and
queues). In computing the communication cost to detect a
deadlock cycle (which is N * (N - 1)/2 exchange of messages,
in worst case, among deadlock detectors), he does not include
the expenses of transaction migration and construction of a
TWFG by deadlock detectors in terms of messages. In contrast, in our model; the transmission of information from a
transaction to a data manager and from a data manager to a
transaction cost one message each. If the above two expenses
are also included in terms of messages, the communication cost
for his algorithm will become equal to that of ours.
V. EXTENSIONS TO THE DEADLOCK
DETECTION ALGORITHM
In this section, we extend the algorithm to take care of two
refinements:
1) availability of share lock (S_lock) mode as well, and
2) allowing a transaction to acquire locks on more than one
data item simultaneously, either in share mode or in exclusive
mode.
A. Share and Exclusive Locks
The Distributed Database Model with Share and Exclusive
Locks: We extend the basic model, discussed in Section II, by
distinguishing a share lock (S_lock) request from an exclusive
lock (X_lock) request. Correspondingly, a locked data item
can be either in S_mode or in X_mode. The desired lock mode
is specified as a parameter of the lock request primitive: Lock(data-item, mode). In order to distinguish between the two
kinds of lock requests, a data manager splits its request_Q into
phase.
Srequest_ Q and Xrequest_ Q, for storing pending S_lock and
C. The Storage Cost
X_lock requests, respectively.
If a data item is free, a transaction can lock it in any mode.
In this algorithm, each transaction requires storage space to
When
a transaction has locked a data item in X_mode, and bemaintain its probe_Q, and a probe_Q exists until the transacthe
come
the
of
the
X_holder, no other transaction can lock the data item
tion enters
second phase
two phase lock protocol.
in
mode.
A transaction can lock a data item in S_mode,
of
The size of a probe_Q depends upon the number higher
any
an
and
become
or
S_holder even if the item is already locked in
priority transactions which wait for it directly transitively.
a data item in S_mode can have several S_
Thus,
A probe_Q shrinks only when the transaction receives a clean S_mode.
holders
it
whereas
can have only one X_holder. When the
message, but not otherwise.
the lock, if the data manager decides to
releases
the
run
X_holder
If maximum number of transactions that can
concur-
SINHA AND NATARAJAN: DISTRIBUTED DEADLOCK DETECTION ALGORITHM
T,
T4
T2
Obj,
Obj4
(a)
Ti
T4
T2
Obj,
Obj4
Obj,
OT3
Obj2
(b)
Fig. 5. (a) A TWFG where a probe gets discarded. (b) A deadlock
caused by incremental share lock remains undetected by the basic
algorithm.
honor S-lock requests, we assume that all S-lock requests
queued in Srequest_Q are scheduled simultaneously.
We note that in this scheduling policy, it is possible that an
X_requester may starve. Hence, this policy is unfair. We shall
discuss this issue later in Section VII.
Since S-holders of a data item can be many, an X_requester
may now wait for more than one transaction simultaneously,
i.e., in a TWFG, a node can have several incoming as well as
outgoing edges.
Deadlock Detection and Resolution: With the availability of
S_locks, it is now possible that S-holders of a data item may
increase incrementally. Consequently, it is possible that antagonistic conflicts for data items may occur incrementally. To
take this into account, a data manager has to initiate a probe
in one more situation apart from those discussed in the basic
algorithm-[refer to Section Ill-A, step 1)].
When a data manager grants an additional S-holder Ts, it
performs the following.
if Xrequest_Q is not empty
then for each X_requester, T,
if Tx > Ts
then initiate pro be(Tx, Ts) and send it to
Ts;
However, this modification alone is not enough since it does
not take into account transactions that wait transitively (now)
for the additional S-holder. We shall elaborate this through
an example.
Consider the scenario shown in Fig. 5(a) where Ti > T, for
all i < j. The data item Obil is share locked by Tl, and the
73
lustrated in Fig. 5(b), this request forms the deadlock cycle
T3 T2 T4 T3 which has only one antagonistic conflict, i.e.,
T2_T4. But the probe(T2, T4) initiated due to this conflict
was discarded by DI, before T3 required S_lock on Obj1.
And hence, this deadlock will remain undetected.
To handle such cases, a data manager, when it grants S-lock
to an additional S_holder T, needs to propagate to Ts copies
of the probes received (may be only some of them), prior to
granting the S_lock to Ts. However, in the basic scheme, a
data manager does not preserve the probes it receives.
There are two possible solutions to this problem.
1) When a data manager schedules an additional S-holder
TS, it asks all X_requesters queued in its Xrequest_Q to retransmit their probe_Q elements so that relevant probes can
be propagated to Ts.
2) Alternatively, a data manager keeps all probes received in
its own probe_Q, and later, when it schedules an additional
S_holder Ts, it checks, for each probe in its probe_Q, whether
the initiator of the probe is of greater priority than that of Ts,
and if so, propagates that probe to Ts.
The former scheme adds complexity since a data manager
must keep track of its requests for probes retransmission and
distinguish an original probe from a retransmitted duplicate
probe. Further, the communication cost for a given configuration cannot be specified exactly. The latter scheme necessitates
storage space within each data manager, but the algorithm remains simple, and the communication cost of a deadlock configuration can be specified exactly. Hence, we use the latter
scheme and modify the basic algorithm as follows.
1) When a data manager receives probe(initiator, junior)
from one of its requesters, it performs the following.
if the data item is in S_mode
then save the probe in the probe_Q;
for each holder
do
if holder = initiator
then declare deadlock and initiate deadlock resolution
else if holder < initiator
then propagate a copy of the probe to the holder;
2) When a data manager grants an additional S_holder Ts
it performs the following.
if Xrequest_Q is not empty
then for each X_requester, Tx
do
if TX > Ts
then initiate probe(Tx, T), and send it to T3;
if the probe_Q is not empty
then for each probe, P, in its probe_Q
do
if Ts <P. initiator
then propagate a copy of P to Ts;
3) When a data manager exits from S_mode, it discards its
data items Obj4 and Obj2 are exclusively locked by T4 and T22,
respectively. Transactions T4 and T2 wait for exclusive locks
to be granted on data items Objl and Obj4, respectively.
Unlike the data manager Dl, D4 finds the antagonistic conflict T2-T4, initiates probe(T2, T4), and sends it to its holder
T4. T4 saves the probe in its probe_Q, and propagates it to
D -where it is an X_requester. On receiving probe(T2, T4),
D1 discards it since its holder T1 is of higher priority than T2,
the initiator of the probe.
Some time later, another transaction T3 requests an S-lock
on the data item Obilj. Since Objl is in S-mode, D1 grants probe_Q.
Post-Resolution Computation: The provision of S_mode rethe S_lock request of T3 immediately. T3 is the additional
S-holder of Objl, and now T4 waits for T3 as well. Since quires only a minor modification in the deadlock resolution
T3 > T4, D1 does not initiate any probe. Later, T3 requests and post-resolution computation. Step 2) of Section Ill-B is
an X_lock for data item Obj2 (held by T2), and waits. As il- modified as follows.
74
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. SE-il, NO. 1, JANUARY 1985
When a data manager receives a clean message
if the data item is in X_mode
then propagate the clean message to the X_holder
else for each S_holder, TS
do propagate a copy of the clean message to T,;
if probe_Q is not empty
then purge every probe that has victim as junior or initiator;
Storage Cost: This extended algorithm requires extra storage
within each data manager for maintaining its own probe_Q.
The probe_Q within a data manager exists till the data item is
in S_mode. As soon as the data item becomes free or it enters
X_mode the probe_Q is discarded.
Delay and Communication Cost: In the original database
model, if a transaction enters into wait state, it can close at
most one deadlock cycle (in a TWFG, a node can have at most
one outgoing edge). But, in a TWFG for the extended model,
a node can ha've several incoming and outgoing edges, and formation of an edge may close simultaneously'many' cycles.
Givdh a TWFG of n nodes that' is acyclic, the maximum number of cycles (say M)' which can be closed simultaneously by
the formation of a single edge is expressed by the following
equation.
M=niC1 +n-iC2 + ** +nlCn_l
where n'C1 is the number of cyclesoflength2, n 1C2 is the
number of cycles of length 3, etc.
Depending upon the type of configuration for each cycle, we
can calculate the delay and the communication cost based on
the formula given in Section IV-A.
For example, consider the TWFG given in Fig. 6(a). All data
items are locked by S_lock requests and all edges are due to
waiting X_lock requests. Ti > Ty for all i < j. Objl is share
locked by T1; Obj2 is share locked by T' and T2 ; Obj3 by T1,
T2, and T3; and Obj4 by T2, T3, and T4. The, Xlock requests of T2, T3 and T4 wait for data items Obj1, Obj2 and
Obj3, respectively. Until now, there is no deadlock cycle in
the TWFG.
When Ti issues an X_lock request for the data item Obj4
and waits, as illustrated in Fig. 6(b), it simultaneously closes
seven cycles. (The number can'be derived from the above
equation.) Three cycles of length 2 (viz. T2 -T1 -'T2' T3 -T1 -T3,
and T4-T1 -T4); three cycles of length 3 (viz. T3-T2-T1 -T3,
T4-T2-T1-T4, and T4-T3-Tl-T4); and one cycle of length 4
(viz. T4-T3-Tj-T1 -IT4) are formed& Though there are seven
cycles in the TWFG, there exist only three antagonistic conflicts, T1 T2, T1_T3, T1_T4. Hence, only three probes will
originate.
Since T1 is the highest priority transaction of every cycle, all
probes will have T, as their initiator, and all deadlock cycles
will be independently detected by various data managers for
which TI is an S_holder. Since the algorithm chooses the
lowest priority transaction as the victim, all transactions except
T, will be junior in at least one of the three probes, and hence,
in worst case, all transactions except T1 may get aborted. On
the contrary, if the initiator is chosen to be the victim, then all
cycles can be broken -simultaneously by aborting only T1.
Ob 3
Qb3
T4
T,
T2
T3
I
Obj3
Obj I
Obj
Obj2
(a)
Obj4
(b)
Fig. 6. (a) A TWFG with multiple outgoing edges. (b) An X-lock re-
quest by T1 simultaneously closes seven cycles.
However, this latter scheme may result in cyclic restart for the
transaction TI.
In case of multiple cycles, early abortion of one transaction
may resolve many cycles simultaneously. For example, if T2
gets aborted on detection of T72-T1 -T2 cycle, cycles T3-T2T1-T3, T4-T2-T1-T4, and T4-T3-T2-T1--T4 will also get resolved simultaneously. This may result in discarding many
probes and clean messages. Hence, in this case, we can compute only the limits (best and worst) of delay and communication cost for a specific configuration. The exact cost will depend upon many other factors such as scheduling policy of
data managers, characteristics of communication substrate, etc.
B. Simultaneous Acquisition ofMultiple Locks
Let us now consider the refinement which allows a transaction to issue more than one lock request simultaneously. If its
requests are not granted immediately, a transaction simultaneously waits for a number of transactions (in a TWFG, a node
will have' several outgoing edges).
The modification needed for step 2) of the basic algorithm
of Section Ill-A is as follows.
When a transaction issues more than one lock request
simultaneously, if all lock requests are not granted immediately (i.e., it waits for multiple locks), it sends a
copy of each probe stored in its probe_Q to all data
managers for which it is a requester.
Now, in the TWFG, a transaction can be the tail of multiple
edges. The nature of this wait-for graph is the same as that
caused by multiple S_holders. And hence, its characteristics
will also be the same.
From the above argument, we can deduce that, in a model
75
SINHA AND NATARAJAN: DISTRIBUTED DEADLOCK DETECTION ALGORITHM
that provides share as well as exclusive lock requests, and also
allows a transaction to issue more than one lock request simultaneously, the characteristics of the graph as well as the complexity of deadlock detection will be similar to the one described in the previous subsection.
A
/\
/\
B
//
VI. HANDLING NESTED TRANSACTIONS
We shall now discuss the applicability of our algorithm to detect deadlocks that may occur in an environment where a transaction can be nested within another transaction. The concept
of a nested transaction permits a transaction to decompose its
task into several subtasks and initiate a new transaction (called
nested transaction or subtransaction) to perform each of the
subtasks. A nested transaction, in turn, may initiate its own
set of nested transactions, thus giving rise to a hierarchy (or
tree) of transactions. Since nesting of transactions follows a
tree structure, we use the terms root, leaf, parent, child, ancestor, and descendant with the usual connotations. Using nested
transactions, it is possible to achieve higher concurrency, and
higher degree of resilience against failures [6].
A. A Model for Nested Transactions
During its execution, a transaction can create a set of nested
transactions, which will be its children, simultaneously. After
creating its children, a parent transaction cannot resume execution until all its children commit or abort. However, a (parent)
transaction may abort at any time, either explicitly because a
child aborted, or implicitly because an ancestor aborted. A
transaction, whether nested or not, always has the properties
of failure atomicity and concurrency transparency. However,
a nested transaction has an additional property: even if a nested
transaction commits, this commitment is only conditional and
the commitment of its effects, i.e., installation of the new
states of the objects modified by it, is dependent on whether
its parent transaction commits or not. This commit dependency follows from the property of atomicity. We allow arbitrary nesting of transactions, and hence the commit dependency is transitive.
Consider the transaction tree shown in Fig. 7. If A, B, and F
are three transactions such that A created B which then created
F, the effects of F must be committed only when both B and
A commit. It should be noted that the commit dependency
relation is asymmetric: only children are dependent on their
parents and not vice versa. Thus, a transaction may commit
even if some (or all) of its children are aborted.
Once all its children commit or abort, a parent transaction
can resume execution, and it can create a new set of children.
A transaction is in wait state if either
1) it is waiting for locks to be granted on some data items, or
2) it is waiting for its children to commit or abort.
Note that a transaction never runs concurrently with its
children.
The commit dependency described above necessitates new
locking rules. This is required because it is not the case that
when a transaction commits, its effects become visible to any
transaction. The visibility of effects of a transaction is governed
by the following rule [14].
./
//
\
F ./
Fig. 7. A transaction tree.
The Visibility Rule: When a transaction A commits, the effects of the transaction tree rooted at A are visible to a transaction X that is external to the transaction tree only if either
1) the root transaction has no parent, or
2) the parent of the root is either X or an ancestor of X.
As an example, consider the transaction tree illustrated in
Fig. 7. The effects of D will be invisible to F, when D commits. Only when C commits the cumulative effects of C, D,
and E become visible to F. When C aborts, the action tree
rooted at C has no effect, even if D and E have committed
earlier.
In order to implement the above visibility rule through a
locking scheme, we introduce the notion of a retainer of a
data item, through the following set of rules [13], [141.
1) When an S_holder (X_holder) of a data item commits,
it releases the lock it held, and the parent of the holder, if any,
becomes an S_retainer (X-retainer) of the data item, unless
it is already an S_retainer (X-retainer) of that item.
2) When an S_holder or the X-holder of a data item aborts,
it releases the lock it held, and no new retainer is introduced.
3) When an S_retainer (X-retainer) of a data item commits, the parent of that retainer, if any, becomes an S_retainer
(X-retainer) of that item, if it is not already one.
4) When an S_retainer (X-retainer) commits or aborts, it
ceases to be an S-retainer (X_retainer) for that data item.
As an example, consider the transaction tree of Fig. 7. When
E commits, C becomes an S_retainer (X-retainer) for all data
items for which E was an S-holder (X-holder). When C commits, A becomes an S_retainer (X-retainer) for all data
items for which C was an S-holder (X-holder) or S_retainer
(X-retainer).
Note that there can be several S_retainers and X-retainers
for a data item simultaneously. Even though there can be only
one X-holder for a data item at any time, multiple X_retainers arise because a transaction tree grows and shrinks dynamically when nested transactions are created, committed, or
aborted. Because of this, it is also possible that a transaction
is a retainer as well as a holder of a data item simultaneously.With the introduction of retainers, we can now restate the
rules for granting locks as follows.
1) If a transaction T requests an S_lock on a data item, it
can be granted if there is no X-holder for the item, and either
a) there is no X-retainer for the item, or
b) each X-retainer is either T, or an ancestor of T.
Presence of an S_holder or an S_retainer does not forbid the
grant of an S_lock.
2) If a transaction T requests an X-lock on a data item, it
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. SE-1l, NO. 1, JANUARY 1985
76
can be granted if there is no S_holder or X_holder for the
item, and either
a) there is no S-retainer or X_retainer for the item, or
b) each S-retainer (X_retainer) is either T, or an ancestor of T.
For example, suppose in the transaction tree of Fig. 7, F requests an S_lock for a data item for which E is an X_holder.
The S_lock can be granted to F only when either there is no
X_retainer or X_holder for the item, or A becomes the only
X_retainer for the item, i.e., when C, D, and E commit or
abort.
When an S_holder releases the lock and if it introduces an
S_retainer to the data item, it may result in simultaneous
scheduling of a descendant X_requester (if any). Similarly,
when an X_holder releases the lock and if it introduces an
X_retainer to the data item, it may result in simultaneous
scheduling of a descendant X_requester (if any), or one or
more descendant S_requesters (if any).
B. Nested Transactions and Deadlock Detection
and Resolution
We shall now discuss the scheme for detecting deadlocks that
can arise in the nested transaction model described above. The
basic detection algorithm needs to be modified, in order to
take into account the fact that a transaction now waits for its
descendants to commit/abort. As in the basic algorithm, we
shall use priorities for transactions in order to determine when
to initiate a deadlock computation, as well as for deadlock
resolution. Timestamps induce priorities among transactions
as described earlier. However, the scheme for assigning timestamps needs to be modified to take into account nested
transactions.
When a nonnested transaction (i.e., the root of a tree) is
created, a (C, i) pair is generated as described in Section II, and
this pair is assigned as the timestamp of the transaction. When
a nested transaction is created, a (C, i) pair is generated, and a
timestamp is generated for the transaction by concatenating
this (C, i) pair with the timestamp of the parent transaction.
Thus, the timestamp of a nested transaction is a sequence of
(C, i) pairs, the length of the sequence being determined by
the depth of nesting. Based on the ordering on (C, i) pairs described in Section II, timestamps of transactions are totally
ordered in the following way.
Given two timestamps, X and Y of the form X1X2 * *Xm
and Y1 Y2 * Yn respectively, where each Xi or Xi is a (C, i)
pair, their relations are defined as follows.
X is greater than Y,
if either
1) m > n, and
for all i, 1 <i<n, Xi Yi,
9
or
2) for some i, <i<min(m, n),
Xi = Y1, X2 = Y2,, Xi-1 = Yi1, and Xi > Yi.
Note that in this order, the priority of a transaction is higher
than that of its descendants.
Deadlock Detection: We now extend the deadlock detection
algorithm described in Section V-A, to take into account
nested transactions.
The probe_Q of a data manager is split into S_probe_Q
and Xprobe_Q: the former stores the probes received from
S_requesters, and the latter stores the probes received from
X_requesters. A transaction has only one probe_Q.
1) If a data manager cannot grant a lock requested by a
transaction, it acts as follows.
if the lock request of a transaction, T, cannot be honored
then begin
for each X_retainer and the X_holder (f any), Tx,
do
if Tx < T
then initiate probe(T, Tx) and send it to Tx;
if X_lock requested
then for each S_retainer and each S_holder, Ts,
do
if Ts< T
then initiate probe(T, T.) and send it to Ts
end;
Note that in no case will a transaction send a probe to its ancestor since an ancestor always has higher priority.
2) When a transaction begins to wait for a data item, or for
its children to commit/abort, it transmits each probe in its
probe_Q to the data manager, or to its children.
3) When a transaction T receives a probe P, it performs the
following.
if P. junior > T then P. unior T;
save P in the probe_Q;
if T is waiting for its children to commit/abort
then transmit a copy of the saved probe to each child
else if T is waiting for a data item
then transmit a copy of the saved probe to the data
manager;
4) When a data manager receives a probe P from a transaction T it acts as follows.
if T if waiting for an S_lock
then save the probe in S_probe_Q
else save the probe in X_probe_Q;
if P. initiator is either a retainer or the holder,
or
P. initiator is a descendant of a retainer or of the holder
then declare deadlock and initiate deadlock resolution
else begin
for each X_retainer and the X_holder (if any), Tx,
do
begin
if P. initiator > T,
then propagate the probe P to Tx
end;
if T is waiting for an X_lock
then for each S_retainer and each S_holder (if any), T,
do
SINHA AND NATARAJAN: DISTRIBUTED DEADLOCK DETECTION ALGORITHM
if P. initiator > T,
then propagate the probe P to
T,
end;
5) When a new retainer or holder is introduced for a data
item, the data manager acts as follows. (Note that when a new
retainer is introduced, the data manager may have simultaneously scheduled a descendant X_requester, or one or more
descendant S_requesters, i.e., the introduction of a new retainer may result in simultaneous introduction of new holder(s)
as well.)
if an S_holder or an S_retainer ls, is introduced then
begin
for each requestor, T, in X_request_Q
do
if T> Ts
then initiate probe(T, Ts) and send it to Ts;
for each probe, P, in X_probe_Q
do
if P. initiator > Ts
then send a copy of P to Ts
end
else % an X_holder or an X_retainer, Tx, is introduced
begin
for each requester, T, in S_request_Q or X_request_Q
do
if T> Tx
then initiate probe(T, Tx) and send it to Tx;
for each probe P, in S_probe_Q or X_probe_Q
do
if P.initiator > Tx
then send a copy of P to Tx
end;
In this extended algorithm, it is possible that a transaction
may receive more than one probe with the same value for initiator. This may arise because the transaction as well as some
of its ancestors may be retainers or holders for a data item simultaneously. In such cases, the transaction needs to process
only the probe that it receives first, and it may discard others.
In Section VII, we discuss this issue again.
Deadlock Resolution and Post-Resolution Computation: As
in the basic algorithm, we abort only the lowest priority transaction to resolve the deadlock. However, the scheme for handling clean messages requires some modifications as given below.
1) When a transaction receives a clean message, it acts as
follows.
if T is in wait state
then if T = initiator
then discard the clean message
else if T is waiting for its children
then propagate a copy of the clean message to every
child
else propagate the clean message to the data
manager where it is waiting.
2) When a data manager receives a clean message, it updates
its S_probe_Q and X_probe_Q, and propagates the message
to all holders and retainers.
77
\ Obj
T1TT2
O
(retained)
Fig. 8. A deadlock cycle with nested transactions.
An Illustrative Example: Let us illustrate the working of
this extended algorithm for detecting deadlocks, through an
example.
Consider the scenario shown in Fig. 8. A transaction T, requests an X_lock for the data item Obj1. The lock cannot be
granted since another transaction T2 is an X_holder for Objl .
T2 has created a child T21 and is waiting for T21 to commit.
T21 is waiting for an S_lock on another data item Obj2, which
has T1 as an X_retainer. (T1 had created earlier a child Tll
which held the item Obj2 in X_mode, and it has committed.)
In the above situation, a deadlock T1_T2_T21_T1 occurs
when T, begins to wait for Obj1. Let us illustrate how this
deadlock is detected. We consider two possible cases.
Case 1: T1 > T2. By definition, it follows that T1 > T21.
When the data manager of Objl, DI, receives the lock request from TI, it originates probe(T1, T2) and sends it to T2.
When T2 receives this probe, it saves the probe in its probe_Q
and propagates it to its child T21.
When T21 receives probe(TI, T2), it modifies it to probe(T1,
T21 ), saves it in its probe_Q, and propagates it to D2, the data
manager of Obj2.
When D2 receives probe(T1, T21 ), it detects a deadlock since
the initiator of the probe Tl is an X_retainer for the item.
The deadlock is resolved by aborting T21 .
Case 2: T2 > T1 . By definition, it follows that T21 > T1 .
Before T, issues its X_lock request for the data item Obj1,
its probe-Q contains probe(T21, Ti ). This is due to the fact
that when D2 cannot grant the Silock to T21, it initiates
probe(T21, T1) and sends it to TI. Upon receiving this probe,
T, saves it in its probe-Q.
When T1 waits for an X_lock on Obj1, it propagates probe(T21, T1) contained in its probe_Q to D1.
Upon receiving probe(T21, TI), D1 detects a deadlock since
the initiator of the probe T21 is a descendant of T2 which is the
X_holder of Objl. The deadlock is resolved by aborting TI.
C. Comparison to Related Work
Moss [131 has also proposed an edge-chasing algorithm for
detecting deadlocks taking into account nested transactions.
As described earlier, a major difference between his algorithm
and ours is that in Moss' scheme, probes are not stored within
transactions and data managers, and his scheme relies on periodic retransmission of probes to ensure eventual detection of
deadlocks. Apart from this, in Moss' scheme, a data manager
sends a probe not to the holders of the item, but always to the
"potential" retainers. Because of this, his algorithm is prone
to detect phantom or false deadlocks.
For example, consider the scenario shown in Fig. 9. There
are two transactions T, and T2 , where T, > T2. T2 has created
two children T21 and T22. T1 waits for an X_lock on an item
78
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. SE-1 1, NO. 1, JANUARY 1985
/
/ \
\
TT2
/\
Objl
~. (active)
@122
Obj 2
Fig. 9. Moss' scheme: phantom deadlock.
Objl, which has T21 as the X_holder. T22 is waiting for an
X_lock on another item Obj2, which has T1 as the X_holder.
T21 is active.
Given this situation, a deadlock occurs only when T21 commits. If T21 aborts of its own accord, say due to some application considerations, no deadlock results. However, in Moss'
scheme, when T1 's request arrives, D1 sends a probe to T2 even
if T21 is active. T2 propagates this probe to T21 and T22. T21
ignores this probe since it is active. But, T22 propagates this to
D2 which detects a deadlock. Meanwhile, if T21 aborts, the
deadlock detected is a false deadlock. In our scheme, no such
false deadlock will be detected since D1 sends a probe to T2
only when it becomes an X_retainer (i.e., when T21 commits).
In general, however, our scheme may also detect phantom
deadlocks, but such deadlocks become false only if waiting
transactions abort, explicitly (on user's request) or implicitly
(due to site crash), after the cycle detecting probe has traversed
through it, but not otherwise.
VII. DISCUSSION
A. Delaying the Initiation of a Probe
Currently in our algorithm, a data manager initiates a probe
as soon as it finds an antagonistic conflict at its site. But an
antagonistic conflict is a potential deadlock situation only if
the holder transaction is in wait state, but not otherwise.
Hence, the initiation and the propagation of the probe can be
delayed until the holder enters the wait state. We suggest that
a data manager, upon the occurrence of an antagonistic conflict, should wait for a specific time period and then only initiate the probe and send it to the holder. Similarly, the propagation of probes received by a data manager can be delayed.
B. Dynamic Assignment of Priorities
Another orthogonal technique that can be incorporated to
improve performance is to assign priority for a transaction only
on demand basis and not a priori. As long as a transaction
does not get into conflict with a transaction in wait state, it
need not be assigned a priority. Whenever a conflict arises
with a waiting transaction, transactions must be assigned priorities, if possible, in such a way that conflict is nonantagonistic. Otherwise, an antagonistic conflict has occurred and a
probe is initiated. Now, a transaction for which a priority has
not been assigned never causes an antagonistic conflict. Thus,
by employing a scheme for dynamic assignment of priorities
[11, occurrence of antagonistic conflicts, and consequently,
initiation of probes can be reduced still further.
C. Other Mechanisms for AssigningPriorities
In our algorithm, we have used timestamps for assigning priorities. However, our scheme is applicable even if some other
mechanism is used for assigning priorities. The only requirement is that the mechanism must induce a total order on transactions. For example, the number of resources held by a transaction can be used to assign a priority for it. To guarantee
uniqueness, we may append the timestamp of the transaction
to the number of resources held. Notice that in this scheme,
the priority of an active transaction changes dynamically as it
acquires resources, but if a transaction is in wait state its priority does not change. Because of this, the nature of a conflict
(antagonistic or otherwise) does not change dynamically, and
hence, our algorithm is applicable to this dynamic priority
scheme as well.
D. Avoidance of Phantom Deadlocks
In our algorithm, if a waiting transaction which is a component of a deadlock cycle aborts (either due to site crash, or
abort of the parent or a child, or on user request) after the detecting probe has traversed through it, we may find a phantom
deadlock. Since a situation of this kind is unpredictable, our
algorithm comes about as close as possible in avoiding detection of phantom deadlocks. The possibility of phantom deadlock can be reduced even further if the victim transaction does
not abort itself until the clean message, initiated by it, comes
to it after circulating through the entire deadlock cycle. This
requires a clean message to traverse beyond initiator (note that
in the algorithm described in Section III, the clean message
does not go beyond initiator).
E. Discarding Duplicate Probes
In our basic algorithm, there is a possibility that some probes
may circulate through a deadlock cycle more than once. Suppose, for example, a transaction which is not part of a deadlock cycle, but waits for (perhaps transitively) a member transaction of a cycle, inserts a probe in the deadlock cycle. If the
outside transaction is of lower priority than the highest priority transaction of the cycle, the inserted probe ceases to
propagate at some point in the cycle. On the other hand, if
the outside transaction is of higher priority than the highest
priority transaction of the cycle, the inserted probe propagates
through the entire cycle, and keeps circulating until the deadlock cycle is broken. (Note that a probe never propagates
through the entire cycle if its initiator is a member of the
cycle.)
For example, consider the configuration (an extension of the
configuration given in Fig. 2) shown in Fig. 10. Here the transaction T2 has acquired X_locks on data items Obj2 and ObIx
before it entered the wait state. A transaction TX, which is
not a member of the deadlock cycle (called an external transaction), requests a lock for ObIx and waits. For simplicity, we
assume that Tx enters the wait state after the deadlock cycle
T1-TN-Ti is formed.
If TXs> T2 (but not otherwise), the data manager Dx will initiate probe(Tx, T2 ) and send it to holder T2. Now, a probe ini-
SINHA AND NATARAJAN: DISTRIBUTED DEADLOCK DETECTION ALGORITHM
Tx,
TN-1
TN
\
bjN-1
TN-2
ObjN-2
T5
T4
Obj4
T2
T3
Obj3
°bi2
Objl
Ti
/
ObI N
Fig. 10. Propagation of an external probe in a deadlock cycle.
tiated by an external transaction (called an external probe) enters the deadlock cycle. T2 will save the probe in its probe_Q,
and since it is waiting for Obj1, will propagate probe(Tx, T2)
79
priority is less than that of Tx. Otherwise, Tx waits for Tr,
which waits for its descendant to commit, and the latter waits
for Tx, resulting in a deadlock. Hence, in the case of nested
transactions, the above fair scheduling policy can be enforced
only when no ancestor of the requesting transaction is a retainer (S_retainer or X_retainer) of the data item. Thus, in
this case, an X_requester may encounter antagonistic conflicts
incrementally.
G. Computation of Cycle Length
Since we use an edge-chasing algorithm, it is quite simple to
compute the length of a deadlock cycle. For this purpose, a
probe should have an additional parameter, say length (1),
which is set to one to start with. When a transaction receives
a probe P, it increments P.1 by one before saving it in its
probe_Q. On receiving a probe P, if a data manager detects a
deadlock, then the value of P.1 gives the length of the deadlock cycle.
H. Voluntary Abort by a Transaction
Though the algorithm is designed for detection and resolution of deadlocks, it can be used by transactions to abort voluntarily rather than wait until a deadlock cycle is formed, detected, and resolved. When a transaction receives a probe, it
can decide to abort voluntarily on either of two conditions:
1) a transaction with very high priority waits for it directly or
transitively, or 2) the value of P.1 is very high, i.e., a big waitfor chain is already formed.
toD1.
If T1 > Tx, i.e., the external transaction's priority is lower
than the highest priority 'transaction of the cycle, DI will discard the probe. On the other hand, if Tx > T1, D1 will propagate the probe to T1. Once this probe has crossed over the
highest priority transaction of the deadlock cycle, it will cover
the entire cycle and will be saved in probe-Qs of all member
transactions (and data managers). This is correct since the external transaction Tx waits directly or transitively on all member transactions of the deadlock cycle. But since Tx > T1, the
probe will keep circulating the cycle indefinitely (until the cycle
is broken) and a member transaction may receive a probe whose
initiator is the initiator for some probe already stored in its
probe-Q. Such a probe can be considered to be a duplicate,
and it should be discarded. To discard these duplicate probes,
ACKNOWLEDGMENT
the following modification to the basic algorithm is needed.
The authors thank the referee for his comments and suggestions. They are also thankful to Prof. K. Mani Chandy and
When a transaction receives a probe from a data manProf.
M. Stonebraker for their helpful discussions.
ager, it discards the probe, if there exists a probe in its
probe_Q which has an identical initiator.
REFERENCES
F. Fair Scheduling of Exclusive Locks
The policy discussed in Section V, of granting an S-lock request when an'X_lock request is already pending, is unfair to
X_requestors. A fair scheduling policy would be as follows.
[11 R. Bayer, K. Elhardt, J. Heigert, and A. Reiser, "Dynamic time-
stamp allocation for transactions in database systems," in Distributed Databases, H. J. Schneider, Ed. Amsterdam, The Netherlands: North-Holland, 1982, pp. 9-20.
[2] P. A. Bernstein and N. Goodman, "Concurrency control in distributed database systems," ACM Comput. Surveys, vol. 13, pp.
185-221, June, 1981.
[3] K. 'M. Chandy and J. Misra, "A distributed algorithm for detecting resource deadlocks in distributed systems," in Proc. ACM
When a transaction T, requests an S_lock, it is granted
if there is no X_holder, and no' X_requester of higher
priority than T.
SIGACT-SIGOPS Symp. Principles of -Disbributed Computing,
Ottawa, Ont., Canada, Aug. 1982.
Such a scheme ensures that an X_requester will never en- [41 K. M. Chandy, J. Misra, and L. M. Haas, "Distributed-deadlock
detection," ACM Trans. Comp ut. Syst., vol. 1, pp. 144-156, May
counter antagonistic conflicts incrementally. However, even
1983.
in this case, S_holders are introduced incrementally, and to
E. G. Coffman, Jr., M. J. Elphick, and A. Shoshani, "System dead[51
take into account transitive wait on these additional S_holders,
locks,"ACMComput. Surveys,-vol. 3, pp. 66-78, June 1971.
we need to maintain probe-Qs within data managers. Further, [6] C. T. Davies, "Recovery semantics for a DB/DC systeni," in Proc.
Nat. Conf.; vol. 28, 1973, pp. 136-141.
now an S_requester may encounter antagonistic conflicts with [71 ACM
K. P. Eswaran, J. N. Gray, R. A. Lorie, and I. L. Traiger, "The
some S_-holders, and in such cases probes must be sent to
notion of consistency and predicate locks in a database system,"
Commun. ACM, vol. 19, pp. 624-633, Nov. 1976.
those S-holders.
D. Gligor and S. H. Shattuck, "On deadlock detection in disWe must point out here that this fair scheduling policy is not [81 V.
tributed systems," IEEE Trans. Software Eng., voL SE-6, pp.
directly applicable in the case of nested transactions since we
435-440, Sept. 1980.
have to take into account retainers also. For example, suppose [9] J. N. Gray, "Notes on database operating systems," in Operating
Systems, An Advanced-Course (Lecture Notes in Computer Scifor some data item there is a retainer Tr and an X_requester
ence 60). Berlin, Germany: Springer-Verlag, 1978, pp. 398-481.
Tx and let us assume that Tx > Tr. Now, when a descendant [101 R. C. Holt, "Some deadlock properties of computer systems,"
ACM Comput. Surveys, vol. 4, pp. 179-195, Dec. 1972.
of Tr requests an S-lock, it must be granted, even though its
80
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. SE-Il, NO. 1, JANUARY 1985
[11] L. Lamport, "Time, clocks and ordering of events in a distributed ment and Computing Techniques, Bombay. From September 1979 to
system," Commur. AC3M, vol. 21, pp. 558-565, July 1978.
August 1980, he was a Visiting Engineer in the Computer Systems Re[121 D. A. Menasce and R. R. Muntz, "Locking and deadlock detec- search Group at Massachusetts Institute of Technology, where he worked
tion in distributed databases," IEEE Trans. Software Eng., voL on concurrency control problems in distributed systems. He has deSE-5, pp. 195-202, May 1979.
signed and implemented various systems which include compilers, gen[13] J.E.B. Moss, "Nested transactions: An approach to reLable dis- eral purpose graphics systems, multiprocessor operating systems, and a
tributed computing," Lab. Comput. Sci., Massachusetts Inst. file server for a local area network. His current research interests are opTechnol., Cambridge, MA, Tech. Rep. 260, Apr. 1981.
erating systems, database concurrency control, and local area networks.
[14] N. Natarajan, "Communication and synchronization in distributed
programs," Ph.D. dissertation, National Centre for Software Development and Computing Techniques, Tata Inst. Fundamental
Res., Bombay, India, Nov. 1983.
[15] R. Obermarck, "Distributed deadlock detection algorithm," ACM
Trans. Database Syst., vol. 7, pp. 187-208, June 1982.
N. Natarajan was born in Madras, India, on June
[161 D. J. Rosenkrantz, R. E. Stearns, and P.M. Lewis, "System level
28, 1950. He received the B.E. (Hons.) degree
concurrency control for distributed database systems," ACM
in electronics and communication engineering
Trans Database Syst., vol. 3, pp. 178-198, June 1978.
from the University of Madras, Madras, in 1972,
the M.E. degree in automation from Indian Institute of Science, Bangalore, India, in 1974,
.% Mukul K. Sinha was born in Patna, India, on
and th PhD. degree in computer science from
_M
the University of Bombay, Bombay, India, in
September 27, 1950. He received the B.Sc.
1983.
(Engineering) degree in electrical engineering
from Bihar Institute of Technology, Sindri,
He has been working with the National Centre
for Software Development and Computing
@t.g
s@ Ine@lIndia, in 1968, the M.Tech degree in electrical
engineering from Indian Institute of Technol- Techniques, Tata Institute of Fundamental Research, Bombay, since
ogy, Kanpur, India, in 1971, and the Ph.D. de- 1974 where he has worked on compilers, operating system for a multigree in computer science from the University of processor, and the design of a local area network. He visited the Laboratory for CQmputer Science, Massachusetts Institute of Technology,
Bombay, Bombay, India, in 1983.
He is currently working as a Scientific Officer during 1979-1980. His research interests include operating systems,
at the National Centre for Software Develop- programming languages, computer networks, and distributed systems.
Timing Constraints of Real-Time Systems:
Constructs for Expressing Them,
Methods of Validating Them
B. DASARATHY,
MEMBER, IEEE
Abstract-This paper examines timing constraints as features of realtime systems. It investigates the various constructs required in requirements languages to express timing constraints and considers how automatic test systems can validate systems that include timing constraints.
Specifically, features needed in test languages to validate timing constraints are discussed. One of the distinguishing aspects of three tools
developed at GTE Laboratories for real-time systems specification and
testing is in their extensive ability to handle timing constraints. Thus,
the paper highlights the timing constraint features of these tools.
Index Tenns-Real-time systems, requirements specification, test
mal languages for expressing the requirements of systems [9].
In particular, researchers have shown an interest in languages
for expressing the requirements of reat-time systems. Examples of such languages are REVS' RSL [1], [2], [7], CCITT's
System Description Language (SDL) [5], Zave's PAISLey
[13], and GTE Laboratories' Real-Time Requirements Language (RTRL) [10].
SDL, RSL, and RTRL share a common view of real-time
systems.
They hold that a real-time system (or the ports it
generation, test language, timing constraints, validation.
serves) can be modeled as finite-state machines (FSM's) in
which a response. at any instance is completely determined by
INTRODUCTION
DURING the past decade there has been great progress in the system's present state and the stimulus that has arrived.
the development of requirements languages; that is, for- The behavior of the system is captured in transitions made
from one state to another state on a stimulus. PAISLey has a
more general view of a real-time system in that it allows both
Manuscript received July 29, 1983.
The author is with GTE Laboratories, Inc., Waltham, MA 02254.
the system sid 4ts-4wkonment to be modeled as interacting
0098-5589/85/0100-0080$01.00 © 1985 IEEE