8000 Fix NOTIFY to cope with I/O problems, such as out-of-disk-space. · lhcezar/postgres@3ac860f · GitHub
[go: up one dir, main page]

Skip to content

Commit 3ac860f

Browse files
committed
Fix NOTIFY to cope with I/O problems, such as out-of-disk-space.
The LISTEN/NOTIFY subsystem got confused if SimpleLruZeroPage failed, which would typically happen as a result of a write() failure while attempting to dump a dirty pg_notify page out of memory. Subsequently, all attempts to send more NOTIFY messages would fail with messages like "Could not read from file "pg_notify/nnnn" at offset nnnnn: Success". Only restarting the server would clear this condition. Per reports from Kevin Grittner and Christoph Berg. Back-patch to 9.0, where the problem was introduced during the LISTEN/NOTIFY rewrite.
1 parent 8a9bcf7 commit 3ac860f

File tree

1 file changed

+22
-5
lines changed

1 file changed

+22
-5
lines changed

src/backend/commands/async.c

Lines changed: 22 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1306,15 +1306,29 @@ static ListCell *
13061306
asyncQueueAddEntries(ListCell *nextNotify)
13071307
{
13081308
AsyncQueueEntry qe;
1309+
QueuePosition queue_head;
13091310
int pageno;
13101311
int offset;
13111312
int slotno;
13121313

13131314
/* We hold both AsyncQueueLock and AsyncCtlLock during this operation */
13141315
LWLockAcquire(AsyncCtlLock, LW_EXCLUSIVE);
13151316

1317+
/*
1318+
* We work with a local copy of QUEUE_HEAD, which we write back to shared
1319+
* memory upon exiting. The reason for this is that if we have to advance
1320+
* to a new page, SimpleLruZeroPage might fail (out of disk space, for
1321+
* instance), and we must not advance QUEUE_HEAD if it does. (Otherwise,
1322+
* subsequent insertions would try to put entries into a page that slru.c
1323+
* thinks doesn't exist yet.) So, use a local position variable. Note
1324+
* that if we do fail, any already-inserted queue entries are forgotten;
1325+
* this is okay, since they'd be useless anyway after our transaction
1326+
* rolls back.
1327+
*/
1328+
queue_head = QUEUE_HEAD;
1329+
13161330
/* Fetch the current page */
1317-
pageno = QUEUE_POS_PAGE(QUEUE_HEAD);
1331+
pageno = QUEUE_POS_PAGE(queue_head);
13181332
slotno = SimpleLruReadPage(AsyncCtl, pageno, true, InvalidTransactionId);
13191333
/* Note we mark the page dirty before writing in it */
13201334
AsyncCtl->shared->page_dirty[slotno] = true;
@@ -1326,7 +1340,7 @@ asyncQueueAddEntries(ListCell *nextNotify)
13261340
/* Construct a valid queue entry in local variable qe */
13271341
asyncQueueNotificationToEntry(n, &qe);
13281342

1329-
offset = QUEUE_POS_OFFSET(QUEUE_HEAD);
1343+
offset = QUEUE_POS_OFFSET(queue_head);
13301344

13311345
/* Check whether the entry really fits on the current page */
13321346
if (offset + qe.length <= QUEUE_PAGESIZE)
@@ -1352,8 +1366,8 @@ asyncQueueAddEntries(ListCell *nextNotify)
13521366
&qe,
13531367
qe.length);
13541368

1355-
/* Advance QUEUE_HEAD appropriately, and note if page is full */
1356-
if (asyncQueueAdvance(&(QUEUE_HEAD), qe.length))
1369+
/* Advance queue_head appropriately, and detect if page is full */
1370+
if (asyncQueueAdvance(&(queue_head), qe.length))
13571371
{
13581372
/*
13591373
* Page is full, so we're done here, but first fill the next page
@@ -1363,12 +1377,15 @@ asyncQueueAddEntries(ListCell *nextNotify)
13631377
* asyncQueueIsFull() ensured that there is room to create this
13641378
* page without overrunning the queue.
13651379
*/
1366-
slotno = SimpleLruZeroPage(AsyncCtl, QUEUE_POS_PAGE(QUEUE_HEAD));
1380+
slotno = SimpleLruZeroPage(AsyncCtl, QUEUE_POS_PAGE(queue_head));
13671381
/* And exit the loop */
13681382
break;
13691383
}
13701384
}
13711385

1386+
/* Success, so update the global QUEUE_HEAD */
1387+
QUEUE_HEAD = queue_head;
1388+
13721389
LWLockRelease(AsyncCtlLock);
13731390

13741391
return nextNotify;

0 commit comments

Comments
 (0)
0