8000 Avoid early reuse of btree pages, causing incorrect query results. · sureandrew/postgres@9ced013 · GitHub
[go: up one dir, main page]

Skip to content

Commit 9ced013

Browse files
Avoid early reuse of btree pages, causing incorrect query results.
When we allowed read-only transactions to skip assigning XIDs we introduced the possibility that a fully deleted btree page could be reused. This broke the index link sequence which could then lead to indexscans silently returning fewer rows than would have been correct. The actual incidence of silent errors from this is thought to be very low because of the exact workload required and locking pre-conditions. Fix is to remove pages only if index page opaque->btpo.xact precedes RecentGlobalXmin. Noah Misch, reviewed and backpatched by Simon Riggs
1 parent 485e12f commit 9ced013

File tree

2 files changed

+7
-5
lines changed

2 files changed

+7
-5
lines changed

src/backend/access/nbtree/README

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -258,13 +258,15 @@ we need to be sure we don't miss or re-scan any items.
258258

259259
A deleted page can only be reclaimed once there is no scan or search that
260260
has a reference to it; until then, it must stay in place with its
261-
right-link undisturbed. We implement this by waiting until all
262-
transactions that were running at the time of deletion are dead; which is
261+
right-link undisturbed. We implement this by waiting until all active
262+
snapshots and registered snapshots as of the deletion are gone; which is
263263
overly strong, but is simple to implement within Postgres. When marked
264264
dead, a deleted page is labeled with the next-transaction counter value.
265265
VACUUM can reclaim the page for re-use when this transaction number is
266-
older than the oldest open transaction. (NOTE: VACUUM FULL can reclaim
267-
such pages immediately.)
266+
older than RecentGlobalXmin. As collateral damage, this implementation
267+
also waits for running XIDs with no snapshots and for snapshots taken
268+
until the next transaction to allocate an XID commits.
269+
(NOTE: VACUUM FULL can reclaim such pages immediately.)
268270

269271
Reclaiming a page doesn't actually change its state on disk --- we simply
270272
record it in the shared-memory free space map, from which it will be

src/backend/access/nbtree/nbtpage.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -633,7 +633,7 @@ _bt_page_recyclable(Page page)
633633
*/
634634
opaque = (BTPageOpaque) PageGetSpecialPointer(page);
635635
if (P_ISDELETED(opaque) &&
636-
TransactionIdPrecedesOrEquals(opaque->btpo.xact, RecentXmin))
636+
TransactionIdPrecedes(opaque->btpo.xact, RecentGlobalXmin))
637637
return true;
638638
return false;
639639
}

0 commit comments

Comments
 (0)
0