8000 Make _bt_killitems drop pins it acquired itself. · postgres/postgres@38c8d29 · GitHub
[go: up one dir, main page]

Skip to content

Commit 38c8d29

Browse files
Make _bt_killitems drop pins it acquired itself.
Teach nbtree's _bt_killitems to leave the so->currPos page that it sets LP_DEAD items on in whatever state it was in when _bt_killitems was called. In particular, make sure that so->dropPin scans don't acquire a pin whose reference is saved in so->currPos.buf. Allowing _bt_killitems to change so->currPos.buf like this is wrong. The immediate consequence of allowing it is that code in _bt_steppage (that copies so->currPos into so->markPos) will behave as if the scan is a !so->dropPin scan. so->markPos will therefore retain the buffer pin indefinitely, even though _bt_killitems only needs to acquire a pin (along with a lock) for long enough to mark known-dead items LP_DEAD. This issue came to light following a report of a failure of an assertion from recent commit e6eed40. The test case in question involves the use of mark and restore. An initial call to _bt_killitems takes place that leaves so->currPos.buf in a state that is inconsistent with the scan being so->dropPin. A subsequent call to _bt_killitems for the same position (following so->currPos being saved in so->markPos, and then restored as so->currPos) resulted in the failure of an assertion that tests that so->currPos.buf is InvalidBuffer when the scan is so->dropPin (non-assert builds got a "resource was not closed" WARNING instead). The same problem exists on earlier releases, though the issue is far more subtle there. Recent commit e6eed40 introduced the so->dropPin field as a partial replacement for testing so->currPos.buf directly. Earlier releases won't get an assertion failure (or buffer pin leak), but they will allow the second _bt_killitems call from the test case to behave as if a buffer pin was consistently held since the original call to _bt_readpage. This is wrong; there will have been an initial window during which no pin was held on the so->currPos page, and yet the second _bt_killitems call will neglect to check if so->currPos.lsn continues to match the page's now-current LSN. As a result of all this, it's just about possible that _bt_killitems will set the wrong items LP_DEAD (on release branches). This could only happen with merge joins (the sole user of nbtree mark/restore support), when a concurrently inserted index tuple used a recently-recycled TID (and only when the new tuple was inserted onto the same page as a distinct concurrently-removed tuple with the same TID). This is exactly the scenario that _bt_killitems' check of the page's now-current LSN against the LSN stashed in currPos was supposed to prevent. A follow-up commit will make nbtree completely stop conditioning whether or not a position's pin needs to be dropped on whether the 'buf' field is set. All call sites that might need to drop a still-held pin will be taught to rely on the scan-level so->dropPin field recently introduced by commit e6eed40. That will make bugs of the same general nature as this one impossible (or make them much easier to detect, at least). Author: Peter Geoghegan <pg@bowt.ie> Reported-By: Alexander Lakhin <exclusion@gmail.com> Discussion: https://postgr.es/m/545be1e5-3786-439a-9257-a90d30f8b849@gmail.com Backpatch-through: 13
1 parent f09fea3 commit 38c8d29

File tree

1 file changed

+20
-14
lines changed

1 file changed

+20
-14
lines changed

src/backend/access/nbtree/nbtutils.c

Lines changed: 20 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -1706,9 +1706,9 @@ _bt_check_rowcompare(ScanKey skey, IndexTuple tuple, int tupnatts,
17061706
* current page and killed tuples thereon (generally, this should only be
17071707
* called if so->numKilled > 0).
17081708
*
1709-
* The caller does not have a lock on the page and may or may not have the
1710-
* page pinned in a buffer. Note that read-lock is sufficient for setting
1711-
* LP_DEAD status (which is only a hint).
1709+
* Caller should not have a lock on the so->currPos page, but may hold a
1710+
* buffer pin. When we return, it still won't be locked. It'll continue to
1711+
* hold whatever pins were held before calling here.
17121712
*
17131713
* We match items by heap TID before assuming they are the right ones to
17141714
* delete. We cope with cases where items have moved right due to insertions.
@@ -1740,7 +1740,8 @@ _bt_killitems(IndexScanDesc scan)
17401740
int i;
17411741
int numKilled = so->numKilled;
17421742
bool killedsomething = false;
1743-
bool droppedpin PG_USED_FOR_ASSERTS_ONLY;
1743+
bool droppedpin;
1744+
Buffer buf;
17441745

17451746
Assert(BTScanPosIsValid(so->currPos));
17461747

@@ -1759,29 +1760,31 @@ _bt_killitems(IndexScanDesc scan)
17591760
* LSN.
17601761
*/
17611762
droppedpin = false;
1762-
LockBuffer(so->currPos.buf, BT_READ);
1763-
1764-
page = BufferGetPage(so->currPos.buf);
1763+
buf = so->currPos.buf;
1764+
LockBuffer(buf, BT_READ);
17651765
}
17661766
else
17671767
{
1768-
Buffer buf;
1768+
XLogRecPtr latestlsn;
17691769

17701770
droppedpin = true;
17711771
/* Attempt to re-read the buffer, getting pin and lock. */
17721772
buf = _bt_getbuf(scan->indexRelation, so->currPos.currPage, BT_READ);
17731773

1774-
page = BufferGetPage(buf);
1775-
if (BufferGetLSNAtomic(buf) == so->currPos.lsn)
1776-
so->currPos.buf = buf;
1777-
else
1774+
latestlsn = BufferGetLSNAtomic(buf);
1775+
Assert(!XLogRecPtrIsInvalid(so->currPos.lsn));
1776+
Assert(so->currPos.lsn <= latestlsn);
1777+
if (so->currPos.lsn != latestlsn)
17781778
{
17791779
/* Modified while not pinned means hinting is not safe. */
17801780
_bt_relbuf(scan->indexRelation, buf);
17811781
return;
17821782
}
1783+
1784+
/* Unmodified, hinting is safe */
17831785
}
17841786

1787+
page = BufferGetPage(buf);
17851788
opaque = (BTPageOpaque) PageGetSpecialPointer(page);
17861789
minoff = P_FIRSTDATAKEY(opaque);
17871790
maxoff = PageGetMaxOffsetNumber(page);
@@ -1897,10 +1900,13 @@ _bt_killitems(IndexScanDesc scan)
18971900
if (killedsomething)
18981901
{
18991902
opaque->btpo_flags |= BTP_HAS_GARBAGE;
1900-
MarkBufferDirtyHint(so->currPos.buf, true);
1903+
MarkBufferDirtyHint(buf, true);
19011904
}
19021905

1903-
LockBuffer(so->currPos.buf, BUFFER_LOCK_UNLOCK);
1906+
if (!droppedpin)
1907+
LockBuffer(buf, BUFFER_LOCK_UNLOCK);
1908+
else
1909+
_bt_relbuf(scan->indexRelation, buf);
19041910
}
19051911

19061912

0 commit comments

Comments
 (0)
0