10000 Refresh snapshot periodically during index validation · michail-nikolaev/postgres@bc8f8fb · GitHub
[go: up one dir, main page]

Skip to content

Commit bc8f8fb

Browse files
Refresh snapshot periodically during index validation
Enhances validation phase of concurrently built indexes by periodically refreshing snapshots rather than using a single reference snapshot. This addresses issues with xmin propagation during long-running validations. The validation now takes a fresh snapshot every few pages, allowing the xmin horizon to advance. This restores feature of commit d9d0762, which was reverted in commit e28bb88. New STIR-based approach is not depends on single reference snapshot anymore.
1 parent efd01b1 commit bc8f8fb

File tree

13 files changed

+179
-83
lines changed

13 files changed

+179
-83
lines changed

doc/src/sgml/ref/create_index.sgml

Lines changed: 8 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -881,9 +881,14 @@ Indexes:
881881
</para>
882882

883883
<para>
884-
Like any long-running transaction, <command>CREATE INDEX</command> on a
885-
table can affect which tuples can be removed by concurrent
886-
<command>VACUUM</command> on any other table.
884+
Due to the improved implementation using periodically refreshed snapshots and
885+
auxiliary indexes, concurrent index builds have minimal impact on concurrent
886+
<command>VACUUM</command> operations. The system automatically advances its
887+
internal transaction horizon during the build process, allowing
888+
<command>VACUUM</command> to remove dead tuples on other tables without
889+
having to wait for the entire index build to complete. Only during very brief
890+
periods when snapshots are being refreshed might there be any temporary effect
891+
on concurrent <command>VACUUM</command> operations.
887892
</para>
888893

889894
<para>

doc/src/sgml/ref/reindex.sgml

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -502,10 +502,13 @@ Indexes:
502502
</para>
503503

504504
<para>
505-
Like any long-running transaction, <command>REINDEX</command> on a table
506-
can affect which tuples can be removed by concurrent
507-
<command>VACUUM</command> on any other table.
508-
</para>
505+
<command>REINDEX CONCURRENTLY</command> has minimal
506+
impact on which tuples can be removed by concurrent <command>VACUUM</command>
507+
operations on other tables. This is achieved through periodic snapshot
508+
refreshes and the use of auxiliary indexes during the rebuild process,
509+
allowing the system to advance its transaction horizon regularly rather than
510+
maintaining a single long-running snapshot.
511+
</para>
509512

510513
<para>
511514
<command>REINDEX SYSTEM</command> does not support

src/backend/access/heap/README.HOT

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -401,12 +401,12 @@ use the key value from the live tuple.
401401
We mark the index open for inserts (but still not ready for reads) then
402402
we again wait for transactions which have the table open. Then validate
403403
the index. This searches for tuples missing from the index in auxiliary
404-
index, and inserts any missing ones if them visible to reference snapshot.
404+
index, and inserts any missing ones if them visible to fresh snapshot.
405405
Again, the index entries have to have TIDs equal to HOT-chain root TIDs, but
406406
the value to be inserted is the one from the live tuple.
407407

408408
Then we wait until every transaction that could have a snapshot older than
409-
the second reference snapshot is finished. This ensures that nobody is
409+
the latest used snapshot is finished. This ensures that nobody is
410410
alive any longer who could need to see any tuples that might be missing
411411
from the index, as well as ensuring that no one can see any inconsistent
412412
rows in a broken HOT chain (the first condition is stronger than the

src/backend/access/heap/heapam_handler.c

Lines changed: 68 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -2034,23 +2034,26 @@ heapam_index_validate_scan_read_stream_next(
20342034
return result;
20352035
}
20362036

2037-
static void
2037+
static TransactionId
20382038
heapam_index_validate_scan(Relation heapRelation,
20392039
Relation indexRelation,
20402040
IndexInfo *indexInfo,
2041-
Snapshot snapshot,
20422041
ValidateIndexState *state,
20432042
ValidateIndexState *auxState)
20442043
{
2044+
TransactionId limitXmin;
2045+
20452046
Datum values[INDEX_MAX_KEYS];
20462047
bool isnull[INDEX_MAX_KEYS];
20472048

2049+
Snapshot snapshot;
20482050
TupleTableSlot *slot;
20492051
EState *estate;
20502052
ExprContext *econtext;
20512053
BufferAccessStrategy bstrategy = GetAccessStrategy(BAS_BULKREAD);
20522054

2053-
int num_to_check;
2055+
int num_to_check,
2056+
page_read_counter = 1; /* set to 1 to skip snapshot reset at start */
20542057
Tuplestorestate *tuples_for_check;
20552058
ValidateIndexScanState callback_private_data;
20562059

@@ -2061,14 +2064,16 @@ heapam_index_validate_scan(Relation heapRelation,
20612064
/* Use 10% of memory for tuple store. */
20622065
int store_work_mem_part = maintenance_work_mem / 10;
20632066

2064-
/*
2065-
* Encode TIDs as int8 values for the sort, rather than directly sorting
2066-
* item pointers. This can be significantly faster, primarily because TID
2067-
* is a pass-by-reference type on all platforms, whereas int8 is
2068-
* pass-by-value on most platforms.
2069-
*/
2067+
PushActiveSnapshot(GetTransactionSnapshot());
2068+
20702069
tuples_for_check = tuplestore_begin_datum(INT8OID, false, false, store_work_mem_part);
20712070

2071+
PopActiveSnapshot();
2072+
InvalidateCatalogSnapshot();
2073+
2074+
Assert(!HaveRegisteredOrActiveSnapshot());
2075+
Assert(!TransactionIdIsValid(MyProc->xmin));
2076+
20722077
/*
20732078
* sanity checks
20742079
*/
@@ -2084,6 +2089,29 @@ heapam_index_validate_scan(Relation heapRelation,
20842089

20852090
state->tuplesort = auxState->tuplesort = NULL;
20862091

2092+
/*
2093+
* Now take the first snapshot that will be used by to filter candidate
2094+
* tuples. We are going to replace it by newer snapshot every so often
2095+
* to propagate horizon.
2096+
*
2097+
* Beware! There might still be snapshots in use that treat some transaction
2098+
* as in-progress that our temporary snapshot treats as committed.
2099+
*
2100+
* If such a recently-committed transaction deleted tuples in the table,
2101+
* we will not include them in the index; yet those transactions which
2102+
* see the deleting one as still-in-progress will expect such tuples to
2103+
* be there once we mark the index as valid.
2104+
*
2105+
* We solve this by waiting for all endangered transactions to exit before
2106+
* we mark the index as valid, for that reason limitXmin is supported.
2107+
*
2108+
* We also set ActiveSnapshot to this snap, since functions in indexes may
2109+
* need a snapshot.
2110+
*/
2111+
snapshot = RegisterSnapshot(GetLatestSnapshot());
2112+
PushActiveSnapshot(snapshot);
2113+
limitXmin = snapshot->xmin;
2114+
20872115
estate = CreateExecutorState();
20882116
econtext = GetPerTupleExprContext(estate);
20892117
slot = MakeSingleTupleTableSlot(RelationGetDescr(heapRelation),
@@ -2117,6 +2145,7 @@ heapam_index_validate_scan(Relation heapRelation,
21172145

21182146
LockBuffer(buf, BUFFER_LOCK_SHARE);
21192147
block_number = BufferGetBlockNumber(buf);
2148+
page_read_counter++;
21202149

21212150
i = 0;
21222151
while ((off = tuples[i]) != InvalidOffsetNumber)
@@ -2172,6 +2201,20 @@ heapam_index_validate_scan(Relation heapRelation,
21722201
}
21732202

21742203
ReleaseBuffer(buf);
2204+
#define VALIDATE_INDEX_RESET_SNAPSHOT_EACH_N_PAGE 4096
2205+
if (page_read_counter % VALIDATE_INDEX_RESET_SNAPSHOT_EACH_N_PAGE == 0)
2206+
{
2207+
PopActiveSnapshot();
2208+
UnregisterSnapshot(snapshot);
2209+
/* to make sure we propagate xmin */
2210+
InvalidateCatalogSnapshot();
2211+
Assert(!TransactionIdIsValid(MyProc->xmin));
2212+
2213+
snapshot = RegisterSnapshot(GetLatestSnapshot());
2214+
PushActiveSnapshot(snapshot);
2215+
/* xmin should not go backwards, but just for the case*/
2216+
limitXmin = TransactionIdNewer(limitXmin, snapshot->xmin);
2217+
}
21752218
}
21762219

21772220
ExecDropSingleTupleTableSlot(slot);
@@ -2181,9 +2224,25 @@ heapam_index_validate_scan(Relation heapRelation,
21812224
read_stream_end(read_stream);
21822225
tuplestore_end(tuples_for_check);
21832226

2227+
/*
2228+
* Drop the latest snapshot. We must do this before waiting out other
2229+
* snapshot holders, else we will deadlock against other processes also
2230+
* doing CREATE INDEX CONCURRENTLY, which would see our snapshot as one
2231+
* they must wait for.
2232+
*/
2233+
PopActiveSnapshot();
2234+
UnregisterSnapshot(snapshot);
2235+
InvalidateCatalogSnapshot();
2236+
Assert(MyProc->xmin == InvalidTransactionId);
2237+
#if USE_INJECTION_POINTS
2238+
if (MyProc->xid == InvalidTransactionId)
2239+
INJECTION_POINT("heapam_index_validate_scan_no_xid", NULL);
2240+
#endif
21842241
/* These may have been pointing to the now-gone estate */
21852242
indexInfo->ii_ExpressionsState = NIL;
21862243
indexInfo->ii_PredicateState = NULL;
2244+
2245+
return limitXmin;
21872246
}
21882247

21892248
/*

src/backend/access/nbtree/nbtsort.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -444,7 +444,7 @@ _bt_spools_heapscan(Relation heap, Relation index, BTBuildState *buildstate,
444444
* dead tuples) won't get very full, so we give it only work_mem.
445445
*
446446
* In case of concurrent build dead tuples are not need to be put into index
447-
* since we wait for all snapshots older than reference snapshot during the
447+
* since we wait for all snapshots older than latest snapshot during the
448448
* validation phase.
449449
*/
450450
if (indexInfo->ii_Unique && !indexInfo->ii_Concurrent)

src/backend/access/spgist/spgvacuum.c

Lines changed: 9 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -191,14 +191,16 @@ vacuumLeafPage(spgBulkDeleteState *bds, Relation index, Buffer buffer,
191191
* Add target TID to pending list if the redirection could have
192192
* happened since VACUUM started. (If xid is invalid, assume it
193193
* must have happened before VACUUM started, since REINDEX
194-
* CONCURRENTLY locks out VACUUM.)
194+
* CONCURRENTLY locks out VACUUM, if myXmin is invalid it is
195+
* validation scan.)
195196
*
196197
* Note: we could make a tighter test by seeing if the xid is
197198
* "running" according to the active snapshot; but snapmgr.c
198199
* doesn't currently export a suitable API, and it's not entirely
199200
* clear that a tighter test is worth the cycles anyway.
200201
*/
201-
if (TransactionIdFollowsOrEquals(dt->xid, bds->myXmin))
202+
if (!TransactionIdIsValid(bds->myXmin) ||
203+
TransactionIdFollowsOrEquals(dt->xid, bds->myXmin))
202204
spgAddPendingTID(bds, &dt->pointer);
203205
}
204206
else
@@ -808,7 +810,6 @@ spgvacuumscan(spgBulkDeleteState *bds)
808810
/* Finish setting up spgBulkDeleteState */
809811
initSpGistState(&bds->spgstate, index);
810812
bds->pendingList = NULL;
811-
bds->myXmin = GetActiveSnapshot()->xmin;
812813
bds->lastFilledBlock = SPGIST_LAST_FIXED_BLKNO;
813814

814815
/*
@@ -959,6 +960,10 @@ spgbulkdelete(IndexVacuumInfo *info, IndexBulkDeleteResult *stats,
959960
bds.stats = stats;
960961
bds.callback = callback;
961962
bds.callback_state = callback_state;
963+
if (info->validate_index)
964+
bds.myXmin = InvalidTransactionId;
965+
else
966+
bds.myXmin = GetActiveSnapshot()->xmin;
962967

963968
spgvacuumscan(&bds);
964969

@@ -999,6 +1004,7 @@ spgvacuumcleanup(IndexVacuumInfo *info, IndexBulkDeleteResult *stats)
9991004
bds.stats = stats;
10001005
bds.callback = dummy_callback;
10011006
bds.callback_state = NULL;
1007+
bds.myXmin = GetActiveSnapshot()->xmin;
10021008

10031009
spgvacuumscan(&bds);
10041010
}

src/backend/catalog/index.c

Lines changed: 31 additions & 11 deletions
Diff line number
Original file line numberDiff line change
@@ -3534,8 +3534,9 @@ IndexCheckExclusion(Relation heapRelation,
35343534
* insert their new tuples into it. At the same moment we clear "indisready" for
35353535
* auxiliary index, since it is no more required to be updated.
35363536
*
3537-
* We then take a new reference snapshot, any tuples that are valid according
3538-
* to this snap, but are not in the index, must be added to the index.
3537+
* We then take a new snapshot, any tuples that are valid according
3538+
* to this snap, but are not in the index, must be added to the index. In
3539+
* order to propagate xmin we reset that snapshot every few so often.
35393540
* (Any tuples committed live after the snap will be inserted into the
35403541
* index by their originating transaction. Any tuples committed dead before
35413542
* the snap need not be indexed, because we will wait out all transactions
@@ -3548,7 +3549,7 @@ IndexCheckExclusion(Relation heapRelation,
35483549
* TIDs of both auxiliary and target indexes, and doing a "merge join" against
35493550
* the TID lists to see which tuples from auxiliary index are missing from the
35503551
* target index. Thus we will ensure that all tuples valid according to the
3551-
* reference snapshot are in the index. Notice we need to do bulkdelete in the
3552+
* latest snapshot are in the index. Notice we need to do bulkdelete in the
35523553
* particular order: auxiliary first, target last.
35533554
*
35543555
* Building a unique index this way is tricky: we might try to insert a
@@ -3569,13 +3570,14 @@ IndexCheckExclusion(Relation heapRelation,
35693570
*
35703571
* Also, some actions to concurrent drop the auxiliary index are performed.
35713572
*/
3572-
void
3573-
validate_index(Oid heapId, Oid indexId, Oid auxIndexId, Snapshot snapshot)
3573+
TransactionId
3574+
validate_index(Oid heapId, Oid indexId, Oid auxIndexId)
35743575
{
35753576
Relation heapRelation,
35763577
indexRelation,
35773578
auxIndexRelation;
35783579
IndexInfo *indexInfo;
3580+
TransactionId limitXmin;
35793581
IndexVacuumInfo ivinfo, auxivinfo;
35803582
ValidateIndexState state, auxState;
35813583
Oid save_userid;
@@ -3625,8 +3627,12 @@ validate_index(Oid heapId, Oid indexId, Oid auxIndexId, Snapshot snapshot)
36253627
* Fetch info needed for index_insert. (You might think this should be
36263628
* passed in from DefineIndex, but its copy is long gone due to having
36273629
* been built in a previous transaction.)
3630+
*
3631+
* We might need snapshot for index expressions or predicates.
36283632
*/
3633+
PushActiveSnapshot(GetTransactionSnapshot());
36293634
indexInfo = BuildIndexInfo(indexRelation);
3635+
PopActiveSnapshot();
36303636

36313637
/* mark build is concurrent just for consistency */
36323638
indexInfo->ii_Concurrent = true;
@@ -3662,6 +3668,9 @@ validate_index(Oid heapId, Oid indexId, Oid auxIndexId, Snapshot snapshot)
36623668
NULL, TUPLESORT_NONE);
36633669
auxState.htups = auxState.itups = auxState.tups_inserted = 0;
36643670

3671+
/* tuplesort_begin_datum may require catalog snapshot */
3672+
InvalidateCatalogSnapshot();
3673+
36653674
(void) index_bulk_delete(&auxivinfo, NULL,
36663675
validate_index_callback, &auxState);
36673676

@@ -3671,6 +3680,9 @@ validate_index(Oid heapId, Oid indexId, Oid auxIndexId, Snapshot snapshot)
36713680
NULL, TUPLESORT_NONE);
36723681
state.htups = state.itups = state.tups_inserted = 0;
36733682

3683+
/* tuplesort_begin_datum may require catalog snapshot */
3684+
InvalidateCatalogSnapshot();
3685+
36743686
/* ambulkdelete updates progress metrics */
36753687
(void) index_bulk_delete(&ivinfo, NULL,
36763688
validate_index_callback, &state);
@@ -3690,19 +3702,24 @@ validate_index(Oid heapId, Oid indexId, Oid auxIndexId, Snapshot snapshot)
36903702
pgstat_progress_update_multi_param(3, progress_index, progress_vals);
36913703
}
36923704
tuplesort_performsort(state.tuplesort);
3705+
/* tuplesort_performsort may require catalog snapshot */
3706+
InvalidateCatalogSnapshot();
3707+
36933708
tuplesort_performsort(auxState.tuplesort);
3709+
/* tuplesort_performsort may require catalog snapshot */
3710+
InvalidateCatalogSnapshot();
3711+
Assert(!TransactionIdIsValid(MyProc->xmin));
36943712

36953713
/*
36963714
* Now merge both indexes
36973715
*/
36983716
pgstat_progress_update_param(PROGRESS_CREATEIDX_PHASE,
36993717
PROGRESS_CREATEIDX_PHASE_VALIDATE_IDXMERGE);
3700-
table_index_validate_scan(heapRelation,
3701-
indexRelation,
3702-
indexInfo,
3703-
snapshot,
3704-
&state,
3705-
&auxState);
3718+
limitXmin = table_index_validate_scan(heapRelation,
3719+
indexRelation,
3720+
indexInfo,
3721+
&state,
3722+
&auxState);
37063723

37073724
/* Tuple sort closed by table_index_validate_scan */
37083725
Assert(state.tuplesort == NULL && auxState.tuplesort == NULL);
@@ -3725,6 +3742,9 @@ validate_index(Oid heapId, Oid indexId, Oid auxIndexId, Snapshot snapshot)
37253742
index_close(auxIndexRelation, NoLock);
37263743
index_close(indexRelation, NoLock);
37273744
table_close(heapRelation, NoLock);
3745+
3746+
Assert(!TransactionIdIsValid(MyProc->xmin));
3747+
return limitXmin;
37283748
}
37293749

37303750
/*

0 commit comments

Comments
 (0)
0