8000 Avoid deadlocks during insertion into SP-GiST indexes. · kosalalakshitha/postgres@cbabf50 · GitHub
[go: up one dir, main page]

Skip to content

Commit cbabf50

Browse files
committed
Avoid deadlocks during insertion into SP-GiST indexes.
SP-GiST's original scheme for avoiding deadlocks during concurrent index insertions doesn't work, as per report from Hailong Li, and there isn't any evident way to make it work completely. We could possibly lock individual inner tuples instead of their whole pages, but preliminary experimentation suggests that the performance penalty would be huge. Instead, if we fail to get a buffer lock while descending the tree, just restart the tree descent altogether. We keep the old tuple positioning rules, though, in hopes of reducing the number of cases where this can happen. Teodor Sigaev, somewhat edited by Tom Lane
1 parent 7e0b9ed commit cbabf50

File tree

4 files changed

+71
-18
lines changed

4 files changed

+71
-18
lines changed

src/backend/access/spgist/README

Lines changed: 25 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -201,17 +201,31 @@ space utilization, but doesn't change the basis of the algorithm.
201201
CONCURRENCY
202202

203203
While descending the tree, the insertion algorithm holds exclusive lock on
204-
two tree levels at a time, ie both parent and child pages (parent and child
205-
pages can be the same, see notes above). There is a possibility of deadlock
206-
between two insertions if there are cross-referenced pages in different
207-
branches. That is, if inner tuple on page M has a child on page N while
208-
an inner tuple from another branch is on page N and has a child on page M,
209-
then two insertions descending the two branches could deadlock. To prevent
210-
deadlocks we introduce a concept of "triple parity" of pages: if inner tuple
211-
is on page with BlockNumber N, then its child tuples should be placed on the
212-
same page, or else on a page with BlockNumber M where (N+1) mod 3 == M mod 3.
213-
This rule guarantees that tuples on page M will have no children on page N,
214-
since (M+1) mod 3 != N mod 3.
204+
two tree levels at a time, ie both parent and child pages (but parent and
205+
child pages can be the same, see notes above). There is a possibility of
206+
deadlock between two insertions if there are cross-referenced pages in
207+
different branches. That is, if inner tuple on page M has a child on page N
208+
while an inner tuple from another branch is on page N and has a child on
209+
page M, then two insertions descending the two branches could deadlock,
210+
since they will each hold their parent page's lock while trying to get the
211+
child page's lock.
212+
213+
Currently, we deal with this by conditionally locking buffers as we descend
214+
the tree. If we fail to get lock on a buffer, we release both buffers and
215+
restart the insertion process. This is potentially inefficient, but the
216+
locking costs of a more deterministic approach seem very high.
217+
218+
To reduce the number of cases where that happens, we introduce a concept of
219+
"triple parity" of pages: if inner tuple is on page with BlockNumber N, then
220+
its child tuples should be placed on the same page, or else on a page with
221+
BlockNumber M where (N+1) mod 3 == M mod 3. This rule ensures that tuples
222+
on page M will have no children on page N, since (M+1) mod 3 != N mod 3.
223+
That makes it unlikely that two insertion processes will conflict against
224+
each other while descending the tree. It's not perfect though: in the first
225+
place, we could still get a deadlock among three or more insertion processes,
226+
and in the second place, it's impractical to preserve this invariant in every
227+
case when we expand or split an inner tuple. So we still have to allow for
228+
deadlocks.
215229

216230
Insertion may also need to take locks on an additional inner and/or leaf page
217231
to add tuples of the right type(s), when there's not enough room on the pages

src/backend/access/spgist/spgdoinsert.c

Lines changed: 30 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1850,9 +1850,13 @@ spgSplitNodeAction(Relation index, SpGistState *state,
18501850
}
18511851

18521852
/*
1853-
* Insert one item into the index
1853+
* Insert one item into the index.
1854+
*
1855+
* Returns true on success, false if we failed to complete the insertion
1856+
* because of conflict with a concurrent insert. In the latter case,
1857+
* caller should re-call spgdoinsert() with the same args.
18541858
*/
1855-
void
1859+
bool
18561860
spgdoinsert(Relation index, SpGistState *state,
18571861
ItemPointer heapPtr, Datum datum, bool isnull)
18581862
{
@@ -1933,12 +1937,32 @@ spgdoinsert(Relation index, SpGistState *state,
19331937
&isNew);
19341938
current.blkno = BufferGetBlockNumber(current.buffer);
19351939
}
1936-
else if (parent.buffer == InvalidBuffer ||
1937-
current.blkno != parent.blkno)
1940+
else if (parent.buffer == InvalidBuffer)
19381941
{
1942+
/* we hold no parent-page lock, so no deadlock is possible */
19391943
current.buffer = ReadBuffer(index, current.blkno);
19401944
LockBuffer(current.buffer, BUFFER_LOCK_EXCLUSIVE);
19411945
}
1946+
else if (current.blkno != parent.blkno)
1947+
{
1948+
/* descend to a new child page */
1949+
current.buffer = ReadBuffer(index, current.blkno);
1950+
1951+
/*
1952+
* Attempt to acquire lock on child page. We must beware of
1953+
* deadlock against another insertion process descending from that
1954+
* page to our parent page (see README). If we fail to get lock,
1955+
* abandon the insertion and tell our caller to start over. XXX
1956+
* this could be improved; perhaps it'd be worth sleeping a bit
1957+
* before giving up?
1958+
*/
1959+
if (!ConditionalLockBuffer(current.buffer))
1960+
{
1961+
ReleaseBuffer(current.buffer);
1962+
UnlockReleaseBuffer(parent.buffer);
1963+
return false;
1964+
}
1965+
}
19421966
else
19431967
{
19441968
/* inner tuple can be stored on the same page as parent one */
@@ -2139,4 +2163,6 @@ spgdoinsert(Relation index, SpGistState *state,
21392163
SpGistSetLastUsedPage(index, parent.buffer);
21402164
UnlockReleaseBuffer(parent.buffer);
21412165
}
2166+
2167+
return true;
21422168
}

src/backend/access/spgist/spginsert.c

Lines changed: 15 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -43,7 +43,10 @@ spgistBuildCallback(Relation index, HeapTuple htup, Datum *values,
4343
/* Work in temp context, and reset it after each tuple */
4444
oldCtx = MemoryContextSwitchTo(buildstate->tmpCtx);
4545

46-
spgdoinsert(index, &buildstate->spgstate, &htup->t_self, *values, *isnull);
46+
/* No concurrent insertions can be happening, so failure is unexpected */
47+
if (!spgdoinsert(index, &buildstate->spgstate, &htup->t_self,
48+
*values, *isnull))
49+
elog(ERROR, "unexpected spgdoinsert() failure");
4750

4851
MemoryContextSwitchTo(oldCtx);
4952
MemoryContextReset(buildstate->tmpCtx);
@@ -217,7 +220,17 @@ spginsert(PG_FUNCTION_ARGS)
217220

218221
initSpGistState(&spgstate, index);
219222

220-
spgdoinsert(index, &spgstate, ht_ctid, *values, *isnull);
223+
/*
224+
* We might have to repeat spgdoinsert() multiple times, if conflicts
225+
* occur with concurrent insertions. If so, reset the insertCtx each time
226+
* to avoid cumulative memory consumption. That means we also have to
227+
* redo initSpGistState(), but it's cheap enough not to matter.
228+
*/
229+
while (!spgdoinsert(index, &spgstate, ht_ctid, *values, *isnull))
230+
{
231+
MemoryContextReset(insertCtx);
232+
initSpGistState(&spgstate, index);
233+
}
221234

222235
SpGistUpdateMetaPage(index);
223236

src/include/access/spgist_private.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -651,7 +651,7 @@ extern void spgPageIndexMultiDelete(SpGistState *state, Page page,
651651
OffsetNumber *itemnos, int nitems,
652652
int firststate, int reststate,
653653
BlockNumber blkno, OffsetNumber offnum);
654-
extern void spgdoinsert(Relation index, SpGistState *state,
654+
extern bool spgdoinsert(Relation index, SpGistState *state,
655655
ItemPointer heapPtr, Datum datum, bool isnull);
656656

657657
#endif /* SPGIST_PRIVATE_H */

0 commit comments

Comments
 (0)
0