8000 Support clean switchover. · kosalalakshitha/postgres@bee4a4d · GitHub
[go: up one dir, main page]

Skip to content

Commit bee4a4d

Browse files
committed
Support clean switchover.
In replication, when we shutdown the master, walsender tries to send all the outstanding WAL records to the standby, and then to exit. This basically means that all the WAL records are fully synced between two servers after the clean shutdown of the master. So, after promoting the standby to new master, we can restart the stopped master as new standby without the need for a fresh backup from new master. But there was one problem so far: though walsender tries to send all the outstanding WAL records, it doesn't wait for them to be replicated to the standby. Then, before receiving all the WAL records, walreceiver can detect the closure of connection and exit. We cannot guarantee that there is no missing WAL in the standby after clean shutdown of the master. In this case, backup from new master is required when restarting the stopped master as new standby. This patch fixes this problem. It just changes walsender so that it waits for all the outstanding WAL records to be replicated 8000 to the standby before closing the replication connection. Per discussion, this is a fix that needs to get backpatched rather than new feature. So, back-patch to 9.1 where enough infrastructure for this exists. Patch by me, reviewed by Andres Freund.
1 parent 99ee15b commit bee4a4d

File tree

1 file changed

+6
-3
lines changed

1 file changed

+6
-3
lines changed

src/backend/replication/walsender.c

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,8 @@
2222
* If the server is shut down, postmaster sends us SIGUSR2 after all
2323
* regular backends have exited and the shutdown checkpoint has been written.
2424
* This instruct walsender to send any outstanding WAL, including the
25-
* shutdown checkpoint record, and then exit.
25+
* shutdown checkpoint record, wait for it to be replicated to the standby,
26+
* and then exit.
2627
*
2728
*
2829
* Portions Copyright (c) 2010-2012, PostgreSQL Global Development Group
@@ -800,15 +801,17 @@ WalSndLoop(void)
800801

801802
/*
802803
* When SIGUSR2 arrives, we send any outstanding logs up to the
803-
* shutdown checkpoint record (i.e., the latest record) and exit.
804+
* shutdown checkpoint record (i.e., the latest record), wait
805+
* for them to be replicated to the standby, and exit.
804806
* This may be a normal termination at shutdown, or a promotion,
805807
* the walsender is not sure which.
806808
*/
807809
if (walsender_ready_to_stop)
808810
{
809811
/* ... let's just be real sure we're caught up ... */
810812
XLogSend(output_message, &caughtup);
811-
if (caughtup && !pq_is_send_pending())
813+
if (caughtup && XLByteEQ(sentPtr, MyWalSnd->flush) &&
814+
!pq_is_send_pending())
812815
{
813816
walsender_shutdown_requested = true;
814817
continue; /* don't want to wait more */

0 commit comments

Comments
 (0)
0