[go: up one dir, main page]

Page MenuHomePhabricator

Pre-switchover cookbook testing
Closed, ResolvedPublic

Description

As described in https://wikitech.wikimedia.org/wiki/Switch_Datacenter#Testing_-_3_weeks_before:

  1. [done] Test sre.switchdc.mediawiki and sre.discovery.datacenter in --dry-run mode
  2. [done] Test sre.switchdc.mediawiki in --live-test mode (with reversed DC direction)

While #2 "should be fine" I'll plan to run it at an hour where more service-ops folks are around.

Event Timeline

Change #1072612 had a related patch set uploaded (by Scott French; author: Scott French):

[operations/cookbooks@master] sre.discovery: set timeout in raw dns.query.udp

https://gerrit.wikimedia.org/r/1072612

swfrench@cumin1002 - Cookbook cookbooks.sre.switchdc.mediawiki.00-disable-puppet for datacenter switchover from codfw to eqiad - finished with status: SUCCESS elapsed time: 0:00:02.267352

swfrench@cumin1002 - Cookbook cookbooks.sre.switchdc.mediawiki.00-downtime-db-readonly-checks for datacenter switchover from codfw to eqiad - finished with status: SUCCESS elapsed time: 0:00:18.989607

swfrench@cumin1002 - Cookbook cookbooks.sre.switchdc.mediawiki.00-reduce-ttl for datacenter switchover from codfw to eqiad - finished with status: SUCCESS elapsed time: 0:05:44.945083

swfrench@cumin1002 - Cookbook cookbooks.sre.switchdc.mediawiki.01-stop-maintenance for datacenter switchover from codfw to eqiad - finished with status: SUCCESS elapsed time: 0:00:15.073548

swfrench@cumin1002 - Cookbook cookbooks.sre.switchdc.mediawiki.02-set-readonly for datacenter switchover from codfw to eqiad - [DRY-RUN] MediaWiki read-only period starts at: 2024-09-16 14:54:20.136310

swfrench@cumin1002 - Cookbook cookbooks.sre.switchdc.mediawiki.02-set-readonly for datacenter switchover from codfw to eqiad - finished with status: SUCCESS elapsed time: 0:00:14.714100

swfrench@cumin1002 - Cookbook cookbooks.sre.switchdc.mediawiki.03-set-db-readonly for datacenter switchover from codfw to eqiad - finished with status: SUCCESS elapsed time: 0:00:34.669890

swfrench@cumin1002 - Cookbook cookbooks.sre.switchdc.mediawiki.04-switch-mediawiki for datacenter switchover from codfw to eqiad - finished with status: SUCCESS elapsed time: 0:00:13.624995

swfrench@cumin1002 - Cookbook cookbooks.sre.switchdc.mediawiki.06-set-db-readwrite for datacenter switchover from codfw to eqiad - finished with status: SUCCESS elapsed time: 0:00:02.542766

swfrench@cumin1002 - Cookbook cookbooks.sre.switchdc.mediawiki.07-set-readwrite for datacenter switchover from codfw to eqiad - [DRY-RUN] MediaWiki read-only period ends at: 2024-09-16 14:57:30.267664

swfrench@cumin1002 - Cookbook cookbooks.sre.switchdc.mediawiki.07-set-readwrite for datacenter switchover from codfw to eqiad - finished with status: SUCCESS elapsed time: 0:00:05.255255

swfrench@cumin1002 - Cookbook cookbooks.sre.switchdc.mediawiki.08-restart-mw-jobrunner for datacenter switchover from codfw to eqiad - finished with status: SUCCESS elapsed time: 0:00:38.628905

swfrench@cumin1002 - Cookbook cookbooks.sre.switchdc.mediawiki.08-start-maintenance for datacenter switchover from codfw to eqiad - finished with status: SUCCESS elapsed time: 0:02:01.776178

swfrench@cumin1002 - Cookbook cookbooks.sre.switchdc.mediawiki.09-restore-ttl for datacenter switchover from codfw to eqiad - finished with status: SUCCESS elapsed time: 0:00:39.516822

swfrench@cumin1002 - Cookbook cookbooks.sre.switchdc.mediawiki.09-run-puppet-on-db-masters for datacenter switchover from codfw to eqiad - finished with status: SUCCESS elapsed time: 0:10:29.520875

No new issues discovered during the live test today. There were a couple of documentation tweaks to make as follow-on (e.g., 03-set-db-readonly no longer fails in the absence of circular replication), which are now done.

There's one additional nice-to-have change, which I'll post shortly.

Change #1073291 had a related patch set uploaded (by Scott French; author: Scott French):

[operations/cookbooks@master] sre.switchdc.mediawiki: show TTL sleep end time

https://gerrit.wikimedia.org/r/1073291

Change #1073524 had a related patch set uploaded (by Scott French; author: Scott French):

[operations/cookbooks@master] sre.discovery.datacenter: restrict checks to active authdns hosts

https://gerrit.wikimedia.org/r/1073524

Change #1073291 merged by jenkins-bot:

[operations/cookbooks@master] sre.switchdc.mediawiki: show TTL sleep end time

https://gerrit.wikimedia.org/r/1073291

Change #1072612 merged by jenkins-bot:

[operations/cookbooks@master] sre.discovery: set timeout in raw dns.query.udp

https://gerrit.wikimedia.org/r/1072612

With those two minor changes merged, I believe there's nothing else explicitly tracked here. I've opened T375285 for the ongoing discussion about how sre.discovery.datacenter deals with depooled authdns hosts.