8000 Avoid holding AutovacuumScheduleLock while rechecking table statistics. · divag711/postgres@4b0e717 · GitHub
[go: up one dir, main page]

Skip to content
8000

Commit 4b0e717

Browse files
committed
Avoid holding AutovacuumScheduleLock while rechecking table statistics.
In databases with many tables, re-fetching the statistics takes some time, so that this behavior seriously decreases the available concurrency for multiple autovac workers. There's discussion afoot about more complete fixes, but a simple and back-patchable amelioration is to claim the table and release the lock before rechecking stats. If we find out there's no longer a reason to process the table, re-taking the lock to un-claim the table is cheap enough. (This patch is quite old, but got lost amongst a discussion of more aggressive fixes. It's not clear when or if such a fix will be accepted, but in any case it'd be unlikely to get back-patched. Let's do this now so we have some improvement for the back branches.) In passing, make the normal un-claim step take AutovacuumScheduleLock not AutovacuumLock, since that is what is documented to protect the wi_tableoid field. This wasn't an actual bug in view of the fact that readers of that field hold both locks, but it creates some concurrency penalty against operations that need only AutovacuumLock. Back-patch to all supported versions. Jeff Janes Discussion: https://postgr.es/m/26118.1520865816@sss.pgh.pa.us
1 parent 44a36a8 commit 4b0e717

File tree

1 file changed

+37
-16
lines changed

1 file changed

+37
-16
lines changed

src/backend/postmaster/autovacuum.c

Lines changed: 37 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -209,9 +209,9 @@ typedef struct autovac_table
209209
* wi_launchtime Time at which this worker was launched
210210
* wi_cost_* Vacuum cost-based delay parameters current in this worker
211211
*
212-
* All fields are protected by AutovacuumLock, except for wi_tableoid which is
213-
* protected by AutovacuumScheduleLock (which is read-only for everyone except
214-
* that worker itself).
212+
* All fields are protected by AutovacuumLock, except for wi_tableoid and
213+
* wi_sharedrel which are protected by AutovacuumScheduleLock (note these
214+
* two fields are read-only for everyone except that worker itself).
215215
*-------------
216216
*/
217217
typedef struct WorkerInfoData
@@ -2197,7 +2197,9 @@ do_autovacuum(void)
21972197
foreach(cell, table_oids)
21982198
{
21992199
Oid relid = lfirst_oid(cell);
2200+
HeapTuple classTup;
22002201
autovac_table *tab;
2202+
bool isshared;
22012203
bool skipit;
22022204
int stdVacuumCostDelay;
22032205
int stdVacuumCostLimit;
@@ -2222,9 +2224,23 @@ do_autovacuum(void)
22222224
}
22232225

22242226
/*
2225-
* hold schedule lock from here until we're sure that this table still
2226-
* needs vacuuming. We also need the AutovacuumLock to walk the
2227-
* worker array, but we'll let go of that one quickly.
2227+
* Find out whether the table is shared or not. (It's slightly
2228+
* annoying to fetch the syscache entry just for this, but in typical
2229+
* cases it adds little cost because table_recheck_autovac would
2230+
* refetch the entry anyway. We could buy that back by copying the
2231+
* tuple here and passing it to table_recheck_autovac, but that
2232+
* increases the odds of that function working with stale data.)
2233+
*/
2234+
classTup = SearchSysCache1(RELOID, ObjectIdGetDatum(relid));
2235+
if (!HeapTupleIsValid(classTup))
2236+
continue; /* somebody deleted the rel, forget it */
2237+
isshared = ((Form_pg_class) GETSTRUCT(classTup))->relisshared;
2238+
ReleaseSysCache(classTup);
2239+
2240+
/*
2241+
* Hold schedule lock from here until we've claimed the table. We
2242+
* also need the AutovacuumLock to walk the worker array, but that one
2243+
* can just be a shared lock.
22282244
*/
22292245
LWLockAcquire(AutovacuumScheduleLock, LW_EXCLUSIVE);
22302246
LWLockAcquire(AutovacuumLock, LW_SHARED);
@@ -2260,6 +2276,16 @@ do_autovacuum(void)
22602276
continue;
22612277
}
22622278

2279+
/*
2280+
* Store the table's OID in shared memory before releasing the
2281+
* schedule lock, so that other workers don't try to vacuum it
2282+
* concurrently. (We claim it here so as not to hold
2283+
* AutovacuumScheduleLock while rechecking the stats.)
2284+
*/
2285+
MyWorkerInfo->wi_tableoid = relid;
2286+
MyWorkerInfo->wi_sharedrel = isshared;
2287+
LWLockRelease(AutovacuumScheduleLock);
2288+
22632289
/*
22642290
* Check whether pgstat data still says we need to vacuum this table.
22652291
* It could have changed if something else processed the table while
@@ -2276,18 +2302,13 @@ do_autovacuum(void)
22762302
if (tab == NULL)
22772303
{
22782304
/* someone else vacuumed the table, or it went away */
2305+
LWLockAcquire(AutovacuumScheduleLock, LW_EXCLUSIVE);
2306+
MyWorkerInfo->wi_tableoid = InvalidOid;
2307+
MyWorkerInfo->wi_sharedrel = false;
22792308
LWLockRelease(AutovacuumScheduleLock);
22802309
continue;
22812310
}
22822311

2283-
/*
2284-
* Ok, good to go. Store the table in shared memory before releasing
2285-
* the lock so that other workers don't vacuum it concurrently.
2286-
*/
2287-
MyWorkerInfo->wi_tableoid = relid;
2288-
MyWorkerInfo->wi_sharedrel = tab->at_sharedrel;
2289-
LWLockRelease(AutovacuumScheduleLock);
2290-
22912312
/*
22922313
* Remember the prevailing values of the vacuum cost GUCs. We have to
22932314
* restore these at the bottom of the loop, else we'll compute wrong
@@ -2397,10 +2418,10 @@ do_autovacuum(void)
23972418
* settings, so we don't want to give up our share of I/O for a very
23982419
* short interval and thereby thrash the global balance.
23992420
*/
2400-
LWLockAcquire(AutovacuumLock, LW_EXCLUSIVE);
2421+
LWLockAcquire(AutovacuumScheduleLock, LW_EXCLUSIVE);
24012422
MyWorkerInfo->wi_tableoid = InvalidOid;
24022423
MyWorkerInfo->wi_sharedrel = false;
2403-
LWLockRelease(AutovacuumLock);
2424+
LWLockRelease(AutovacuumScheduleLock);
24042425

24052426
/* restore vacuum cost GUCs for the next iteration */
24062427
VacuumCostDelay = stdVacuumCostDelay;

0 commit comments

Comments
 (0)
0