8000 Protect against SnapshotNow race conditions in pg_tablespace scans. · mwarnock/postgres@6eedec1 · GitHub
[go: up one dir, main page]

Skip to content

Commit 6eedec1

Browse files
committed
Protect against SnapshotNow race conditions in pg_tablespace scans.
Use of SnapshotNow is known to expose us to race conditions if the tuple(s) being sought could be updated by concurrently-committing transactions. CREATE DATABASE and DROP DATABASE are particularly exposed because they do heavyweight filesystem operations during their scans of pg_tablespace, so that the scans run for a very long time compared to most. Furthermore, the potential consequences of a missed or twice-visited row are nastier than average: * createdb() could fail with a bogus "file already exists" error, or silently fail to copy one or more tablespace's worth of files into the new database. * remove_dbtablespaces() could miss one or more tablespaces, thus failing to free filesystem space for the dropped database. * check_db_file_conflict() could likewise miss a tablespace, leading to an OID conflict that could result in data loss either immediately or in future operations. (This seems of very low probability, though, since a duplicate database OID would be unlikely to start with.) Hence, it seems worth fixing these three places to use MVCC snapshots, even though this will someday be superseded by a generic solution to SnapshotNow race conditions. Back-patch to all active branches. Stephen Frost and Tom Lane
1 parent 46dc33f commit 6eedec1

File tree

1 file changed

+48
-3
lines changed

1 file changed

+48
-3
lines changed

src/backend/commands/dbcommands.c

Lines changed: 48 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -107,6 +107,7 @@ createdb(const CreatedbStmt *stmt)
107107
int dbconnlimit = -1;
108108
int ctype_encoding;
109109
createdb_failure_params fparms;
110+
Snapshot snapshot;
110111

111112
/* Extract options from the statement node tree */
112113
foreach(option, stmt->options)
@@ -460,6 +461,29 @@ createdb(const CreatedbStmt *stmt)
460461
*/
461462
RequestCheckpoint(CHECKPOINT_IMMEDIATE | CHECKPOINT_FORCE | CHECKPOINT_WAIT);
462463

464+
/*
465+
* Take an MVCC snapshot to use while scanning through pg_tablespace. For
466+
* safety, copy the snapshot (this prevents it from changing if
467+
* something else were to request a snapshot during the loop).
468+
*
469+
* Traversing pg_tablespace with an MVCC snapshot is necessary to provide
470+
* us with a consistent view of the tablespaces that exist. Using
471+
* SnapshotNow here would risk seeing the same tablespace multiple times,
472+
* or worse not seeing a tablespace at all, if its tuple is moved around
473+
* by a concurrent update (eg an ACL change).
474+
*
475+
* Inconsistency of this sort is inherent to all SnapshotNow scans, unless
476+
* some lock is held to prevent concurrent updates of the rows being
477+
* sought. There should be a generic fix for that, but in the meantime
478+
* it's worth fixing this case in particular because we are doing very
479+
* heavyweight operations within the scan, so that the elapsed time for
480+
* the scan is vastly longer than for most other catalog scans. That
481+
* means there's a much wider window for concurrent updates to cause
482+
* trouble here than anywhere else. XXX this code should be changed
483+
* whenever a generic fix is implemented.
484+
*/
485+
snapshot = CopySnapshot(GetLatestSnapshot());
486+
463487
/*
464488
* Once we start copying subdirectories, we need to be able to clean 'em
465489
* up if we fail. Use an ENSURE block to make sure this happens. (This
@@ -477,7 +501,7 @@ createdb(const CreatedbStmt *stmt)
477501
* each one to the new database.
478502
*/
479503
rel = heap_open(TableSpaceRelationId, AccessShareLock);
480-
scan = heap_beginscan(rel, SnapshotNow, 0, NULL);
504+
scan = heap_beginscan(rel, snapshot, 0, NULL);
481505
while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
482506
{
483507
Oid srctablespace = HeapTupleGetOid(tuple);
@@ -1322,9 +1346,20 @@ remove_dbtablespaces(Oid db_id)
13221346
Relation rel;
13231347
HeapScanDesc scan;
13241348
HeapTuple tuple;
1349+
Snapshot snapshot;
1350+
1351+
/*
1352+
* As in createdb(), we'd better use an MVCC snapshot here, since this
1353+
* scan can run for a long time. Duplicate visits to tablespaces would be
1354+
* harmless, but missing a tablespace could result in permanently leaked
1355+
* files.
1356+
*
1357+
* XXX change this when a generic fix for SnapshotNow races is implemented
1358+
*/
1359+
snapshot = CopySnapshot(GetLatestSnapshot());
13251360

13261361
rel = heap_open(TableSpaceRelationId, AccessShareLock);
1327-
scan = heap_beginscan(rel, SnapshotNow, 0, NULL);
1362+
scan = heap_beginscan(rel, snapshot, 0, NULL);
13281363
while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
13291364
{
13301365
Oid dsttablespace = HeapTupleGetOid(tuple);
@@ -1391,9 +1426,19 @@ check_db_file_conflict(Oid db_id)
13911426
Relation rel;
13921427
HeapScanDesc scan;
13931428
HeapTuple tuple;
1429+
Snapshot snapshot;
1430+
1431+
/*
1432+
* As in createdb(), we'd better use an MVCC snapshot here; missing a
1433+
* tablespace could result in falsely reporting the OID is unique, with
1434+
* disastrous future consequences per the comment above.
1435+
*
1436+
* XXX change this when a generic fix for SnapshotNow races is implemented
1437+
*/
1438+
snapshot = CopySnapshot(GetLatestSnapshot());
13941439

13951440
rel = heap_open(TableSpaceRelationId, AccessShareLock);
1396-
scan = heap_beginscan(rel, SnapshotNow, 0, NULL);
1441+
scan = heap_beginscan(rel, snapshot, 0, NULL);
13971442
while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
13981443
{
13991444
Oid dsttablespace = HeapTupleGetOid(tuple);

0 commit comments

Comments
 (0)
0