8000 Avoid holding AutovacuumScheduleLock while rechecking table statistics. · linzhihui/postgres@231329a · GitHub
[go: up one dir, main page]

Skip to content

Commit 231329a

Browse files
committed
Avoid holding AutovacuumScheduleLock while rechecking table statistics.
In databases with many tables, re-fetching the statistics takes some time, so that this behavior seriously decreases the available concurrency for multiple autovac workers. There's discussion afoot about more complete fixes, but a simple and back-patchable amelioration is to claim the table and release the lock before rechecking stats. If we find out there's no longer a reason to process the table, re-taking the lock to un-claim the table is cheap enough. (This patch is quite old, but got lost amongst a discussion of more aggressive fixes. It's not clear when or if such a fix will be accepted, but in any case it'd be unlikely to get back-patched. Let's do this now so we have some improvement for the back branches.) In passing, make the normal un-claim step take AutovacuumScheduleLock not AutovacuumLock, since that is what is documented to protect the wi_tableoid field. This wasn't an actual bug in view of the fact that readers of that field hold both locks, but it creates some concurrency penalty against operations that need only AutovacuumLock. Back-patch to all supported versions. Jeff Janes Discussion: https://postgr.es/m/26118.1520865816@sss.pgh.pa.us
1 parent 95f0260 commit 231329a

File tree

1 file changed

+37
-16
lines changed

1 file changed

+37
-16
lines changed

src/backend/postmaster/autovacuum.c

Lines changed: 37 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -209,9 +209,9 @@ typedef struct autovac_table
209209
* wi_launchtime Time at which this worker was launched
210210
* wi_cost_* Vacuum cost-based delay parameters current in this worker
211211
*
212-
* All fields are protected by AutovacuumLock, except for wi_tableoid which is
213-
* protected by AutovacuumScheduleLock (which is read-only for everyone except
214-
* that worker itself).
212+
* All fields are protected by AutovacuumLock, except for wi_tableoid and
213+
* wi_sharedrel which are protected by AutovacuumScheduleLock (note these
214+
* two fields are read-only for everyone except that worker itself).
215215
*-------------
216216
*/
217217
typedef struct WorkerInfoData
@@ -2205,7 +2205,9 @@ do_autovacuum(void)
22052205
foreach(cell, table_oids)
22062206
{
22072207
Oid relid = lfirst_oid(cell);
2208+
HeapTuple classTup;
22082209
autovac_table *tab;
2210+
bool isshared;
22092211
bool skipit;
22102212
int stdVacuumCostDelay;
22112213
int stdVacuumCostLimit;
@@ -2230,9 +2232,23 @@ do_autovacuum(void)
22302232
}
22312233

22322234
/*
2233-
* hold schedule lock from here until we're sure that this table still
2234-
* needs vacuuming. We also need the AutovacuumLock to walk the
2235-
* worker array, but we'll let go of that one quickly.
2235+
* Find out whether the table is shared or not. (It's slightly
2236+
* annoying to fetch the syscache entry just for this, but in typical
2237+
* cases it adds little cost because table_recheck_autovac would
2238+
* refetch the entry anyway. We could buy that back by copying the
2239+
* tuple here and passing it to table_recheck_autovac, but that
2240+
* increases the odds of that function working with stale data.)
2241+
*/
2242+
classTup = SearchSysCache1(RELOID, ObjectIdGetDatum(relid));
2243+
if (!HeapTupleIsValid(classTup))
2244+
continue; /* somebody deleted the rel, forget it */
2245+
isshared = ((Form_pg_class) GETSTRUCT(classTup))->relisshared;
2246+
ReleaseSysCache(classTup);
2247+
2248+
/*
2249+
* Hold schedule lock from here until we've claimed the table. We
2250+
* also need the AutovacuumLock to walk the worker array, but that one
2251+
* can just be a shared lock.
22362252
*/
22372253
LWLockAcquire(AutovacuumScheduleLock, LW_EXCLUSIVE);
22382254
LWLockAcquire(AutovacuumLock, LW_SHARED);
@@ -2268,6 +2284,16 @@ do_autovacuum(void)
22682284
continue;
22692285
}
22702286

2287+
/*
2288+
* Store the table's OID in shared memory before releasing the
2289+
* schedule lock, so that other workers don't try to vacuum it
2290+
* concurrently. (We claim it here so as not to hold
2291+
* AutovacuumScheduleLock while rechecking the stats.)
2292+
*/
2293+
MyWorkerInfo->wi_tableoid = relid;
2294+
MyWorkerInfo->wi_sharedrel = isshared;
2295+
LWLockRelease(AutovacuumScheduleLock);
2296+
22712297
/*
22722298
* Check whether pgstat data still says we need to vacuum this table.
22732299
* It could have changed if something else processed the table while
@@ -2284,18 +2310,13 @@ do_autovacuum(void)
22842310
if (tab == NULL)
22852311
{
22862312
/* someone else vacuumed the table, or it went away */
2313+
LWLockAcquire(AutovacuumScheduleLock, LW_EXCLUSIVE);
2314+
MyWorkerInfo->wi_tableoid = InvalidOid;
2315+
MyWorkerInfo->wi_sharedrel = false;
22872316
LWLockRelease(AutovacuumScheduleLock);
22882317
continue;
22892318
}
22902319

2291-
/*
2292-
* Ok, good to go. Store the table in shared memory before releasing
2293-
* the lock so that other workers don't vacuum it concurrently.
2294-
*/
2295-
MyWorkerInfo->wi_tableoid = relid;
2296-
MyWorkerInfo->wi_sharedrel = tab->at_sharedrel;
2297-
LWLockRelease(AutovacuumScheduleLock);
2298-
22992320
/*
23002321
* Remember the prevailing values of the vacuum cost GUCs. We have to
23012322
* restore these at the bottom of the loop, else we'll compute wrong
@@ -2405,10 +2426,10 @@ do_autovacuum(void)
24052426
* settings, so we don't want to give up our share of I/O for a very
24062427
* short interval and thereby thrash the global balance.
24072428
*/
2408-
LWLockAcquire(AutovacuumLock, LW_EXCLUSIVE);
2429+
LWLockAcquire(AutovacuumScheduleLock, LW_EXCLUSIVE);
24092430
MyWorkerInfo->wi_tableoid = InvalidOid;
24102431
MyWorkerInfo->wi_sharedrel = false;
2411-
LWLockRelease(AutovacuumLock);
2432+
LWLockRelease(AutovacuumScheduleLock);
24122433

24132434
/* restore vacuum cost GUCs for the next iteration */
24142435
VacuumCostDelay = stdVacuumCostDelay;

0 commit comments

Comments
 (0)
0