8000 Improve replication lag interpolation after idle period · postgrespro/postgres@9ea3c64 · GitHub
[go: up one dir, main page]

Skip to content
  • Commit 9ea3c64

    Browse files
    Improve replication lag interpolation after idle period
    After sitting idle and fully replayed for a while and then encountering a new burst of WAL activity, we interpolate between an ancient sample and the not-yet-reached one for the new traffic. That produced a corner case report of lag after receiving first new reply from standby, which might sometimes be a large spike. Correct this by resetting last_read time and handle that new case. Author: Thomas Munro
    1 parent a79122b commit 9ea3c64

    File tree

    1 file changed

    +25
    -4
    lines changed

    1 file changed

    +25
    -4
    lines changed

    src/backend/replication/walsender.c

    Lines changed: 25 additions & 4 deletions
    Original file line numberDiff line numberDiff line change
    @@ -3443,6 +3443,16 @@ LagTrackerRead(int head, XLogRecPtr lsn, TimestampTz now)
    34433443
    (LagTracker.read_heads[head] + 1) % LAG_TRACKER_BUFFER_SIZE;
    34443444
    }
    34453445

    3446+
    /*
    3447+
    * If the lag tracker is empty, that means the standby has processed
    3448+
    * everything we've ever sent so we should now clear 'last_read'. If we
    3449+
    * didn't do that, we'd risk using a stale and irrelevant sample for
    3450+
    * interpolation at the beginning of the next burst of WAL after a period
    3451+
    * of idleness.
    3452+
    */
    3453+
    if (LagTracker.read_heads[head] == LagTracker.write_head)
    3454+
    LagTracker.last_read[head].time = 0;
    3455+
    34463456
    if (time > now)
    34473457
    {
    34483458
    /* If the clock somehow went backwards, treat as not found. */
    @@ -3459,9 +3469,14 @@ LagTrackerRead(int head, XLogRecPtr lsn, TimestampTz now)
    34593469
    * eventually start moving again and cross one of our samples before
    34603470
    * we can show the lag increasing.
    34613471
    */
    3462-
    if (LagTracker.read_heads[head] != LagTracker.write_head &&
    3463-
    LagTracker.last_read[head].time != 0)
    3472+
    if (LagTracker.read_heads[head] == LagTracker.write_head)
    34643473
    {
    3474+
    /* There are no future samples, so we can't interpolate. */
    3475+
    return -1;
    3476+
    }
    3477+
    else if (LagTracker.last_read[head].time != 0)
    3478+
    {
    3479+
    /* We can interpolate between last_read and the next sample. */
    34653480
    double fraction;
    34663481
    WalTimeSample prev = LagTracker.last_read[head];
    34673482
    WalTimeSample next = LagTracker.buffer[LagTracker.read_heads[head]];
    @@ -3494,8 +3509,14 @@ LagTrackerRead(int head, XLogRecPtr lsn, TimestampTz now)
    34943509
    }
    34953510
    else
    34963511
    {
    3497-
    /* Couldn't interpolate due to lack of data. */
    3498-
    return -1;
    3512+
    /*
    3513+
    * We have only a future sample, implying that we were entirely
    3514+
    * caught up but and now there is a new burst of WAL and the
    3515+
    * standby hasn't processed the first sample yet. Until the
    3516+
    * standby reaches the future sample the best we can do is report
    3517+
    * the hypothetical lag if that sample were to be replayed now.
    3518+
    */
    3519+
    time = LagTracker.buffer[LagTracker.read_heads[head]].time;
    34993520
    }
    35003521
    }
    35013522

    0 commit comments

    Comments
     (0)
    0