[go: up one dir, main page]

Page MenuHomePhabricator

kafka-main200[6789] and kafka-main2010 implementation tracking
Closed, ResolvedPublic

Description

Per serviceops request, all hosts being imaged and setup by DC Ops will have a sub-task for tracking service ops implementation.

This task is for #service-ops and all questions regarding status of the hosts should direct to parent task T363209. Once that task is resolved this can take place.

Replacement plan (for each node in the cluster) is discussed in T373189: Establish a proper process for repacing kafka nodes

Details

Show related patches Customize query in gerrit

Event Timeline

JMeybohm renamed this task from serviceops kafka-main200[6789] and kafka-main2010 implementation tracking to kafka-main200[6789] and kafka-main2010 implementation tracking.Jun 28 2024, 9:49 AM
JMeybohm claimed this task.
JMeybohm changed the task status from Stalled to In Progress.Aug 13 2024, 9:17 AM

Change #1064714 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/puppet@production] site.pp: Split node blocks of new kafka nodes into two

https://gerrit.wikimedia.org/r/1064714

Change #1064715 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/puppet@production] kafka-main: Replace kafka-main2001 with kafka-main2006

https://gerrit.wikimedia.org/r/1064715

Icinga downtime and Alertmanager silence (ID=6087c1fc-d7cf-44c1-8f1a-328699314b21) set by jayme@cumin1002 for 1 day, 0:00:00 on 1 host(s) and their services with reason: Hardware refresh

kafka-main2001.codfw.wmnet

Change #1064714 merged by JMeybohm:

[operations/puppet@production] site.pp: Split node blocks of new kafka nodes into two

https://gerrit.wikimedia.org/r/1064714

Change #1064715 merged by JMeybohm:

[operations/puppet@production] kafka-main: Replace kafka-main2001 with kafka-main2006

https://gerrit.wikimedia.org/r/1064715

Icinga downtime and Alertmanager silence (ID=962b8f3e-5b6f-4ce2-b907-37b801206cda) set by jayme@cumin1002 for 1 day, 0:00:00 on 1 host(s) and their services with reason: Hardware refresh

kafka-main2006.codfw.wmnet

Change #1064730 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/puppet@production] Add replacement kafka nodes to kafka_brokers_main

https://gerrit.wikimedia.org/r/1064730

Change #1064730 merged by JMeybohm:

[operations/puppet@production] Add replacement kafka nodes to kafka_brokers_main

https://gerrit.wikimedia.org/r/1064730

Change #1064758 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/deployment-charts@master] Update various kafka-main connection strings

https://gerrit.wikimedia.org/r/1064758

Change #1064758 merged by jenkins-bot:

[operations/deployment-charts@master] Update various kafka-main connection strings

https://gerrit.wikimedia.org/r/1064758

Icinga downtime and Alertmanager silence (ID=61035eab-a385-4c69-8001-e2c6caf52946) set by jayme@cumin1002 for 2 days, 0:00:00 on 1 host(s) and their services with reason: Hardware refresh

kafka-main2001.codfw.wmnet

Icinga downtime and Alertmanager silence (ID=1f9d2e89-ff2c-47c0-ae0e-3a1dd3fcd648) set by jayme@cumin1002 for 4 days, 0:00:00 on 1 host(s) and their services with reason: Decom next week

kafka-main2001.codfw.wmnet

Change #1071610 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/puppet@production] kafka-main: Replace kafka-main2002 with kafka-main2007

https://gerrit.wikimedia.org/r/1071610

Mentioned in SAL (#wikimedia-operations) [2024-09-10T07:55:27Z] <jayme> evacuating leadership for all partitions assigned to broker id 2002 on kafka-main-codfw - T363210

Icinga downtime and Alertmanager silence (ID=1ecd31b5-5c44-49dc-a69c-a3104ecc9241) set by jayme@cumin1002 for 1 day, 0:00:00 on 2 host(s) and their services with reason: Hardware refresh

kafka-main[2002,2007].codfw.wmnet

Change #1071610 merged by JMeybohm:

[operations/puppet@production] kafka-main: Replace kafka-main2002 with kafka-main2007

https://gerrit.wikimedia.org/r/1071610

Change #1071844 had a related patch set uploaded (by Vgutierrez; author: Vgutierrez):

[operations/puppet@production] hiera: let purged use closest cluster on codfw, ulsfo and eqsin

https://gerrit.wikimedia.org/r/1071844

Change #1071880 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/deployment-charts@master] Replace kafka-main2002 with kafka-main2006

https://gerrit.wikimedia.org/r/1071880

Change #1071880 merged by jenkins-bot:

[operations/deployment-charts@master] Replace kafka-main2002 with kafka-main2006

https://gerrit.wikimedia.org/r/1071880

Mentioned in SAL (#wikimedia-operations) [2024-09-10T14:25:25Z] <jayme> restoring leadership for partitions assigned to broker id 2002 on kafka-main-codfw - T363210

Change #1072138 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/puppet@production] kafka-main: Replace kafka-main2003 with kafka-main2008

https://gerrit.wikimedia.org/r/1072138

Mentioned in SAL (#wikimedia-operations) [2024-09-11T07:49:40Z] <jayme> evacuating leadership for all partitions assigned to broker id 2003 on kafka-main-codfw - T363210

Icinga downtime and Alertmanager silence (ID=9c649759-0814-4af1-9b77-d1cbc2c297aa) set by jayme@cumin1002 for 1 day, 0:00:00 on 2 host(s) and their services with reason: Hardware refresh

kafka-main[2003,2008].codfw.wmnet

Change #1072138 merged by JMeybohm:

[operations/puppet@production] kafka-main: Replace kafka-main2003 with kafka-main2008

https://gerrit.wikimedia.org/r/1072138

Change #1072182 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/puppet@production] kafka-main: Fix regex for kafka-main in codfw

https://gerrit.wikimedia.org/r/1072182

Change #1072182 merged by JMeybohm:

[operations/puppet@production] kafka-main: Fix regex for kafka-main in codfw

https://gerrit.wikimedia.org/r/1072182

Mentioned in SAL (#wikimedia-operations) [2024-09-11T12:12:35Z] <jayme> restoring leadership for partitions assigned to broker id 2003 on kafka-main-codfw - T363210

Change #1072210 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/deployment-charts@master] Replace kafka-main2003 with kafka-main2008

https://gerrit.wikimedia.org/r/1072210

Change #1072210 merged by jenkins-bot:

[operations/deployment-charts@master] Replace kafka-main2003 with kafka-main2008

https://gerrit.wikimedia.org/r/1072210

Mentioned in SAL (#wikimedia-operations) [2024-09-11T14:43:15Z] <jayme> deployed changeprop-jobqueue changeprop cirrus-streaming-updater eventgate-main eventstreams mw-page-content-change-enrich rdf-streaming-updater for kafka connection string updates - T363210

Mentioned in SAL (#wikimedia-operations) [2024-09-12T06:33:19Z] <jayme> evacuating leadership for all partitions assigned to broker id 2004 on kafka-main-codfw - T363210

Icinga downtime and Alertmanager silence (ID=358ccffc-965f-4494-af81-ec1629049541) set by jayme@cumin1002 for 1 day, 0:00:00 on 2 host(s) and their services with reason: Hardware refresh

kafka-main[2004,2009].codfw.wmnet

Change #1072441 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/puppet@production] kafka-main: Replace kafka-main2004 with kafka-main2009

https://gerrit.wikimedia.org/r/1072441

Change #1072441 merged by JMeybohm:

[operations/puppet@production] kafka-main: Replace kafka-main2004 with kafka-main2009

https://gerrit.wikimedia.org/r/1072441

Change #1072472 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/deployment-charts@master] Replace kafka-main2004 with kafka-main2009

https://gerrit.wikimedia.org/r/1072472

Change #1072485 had a related patch set uploaded (by Gmodena; author: Gmodena):

[operations/deployment-charts@master] mw-page-content-change-enrich: fix kafka values.

https://gerrit.wikimedia.org/r/1072485

Mentioned in SAL (#wikimedia-operations) [2024-09-12T08:49:25Z] <jayme> restoring leadership for all partitions assigned to broker id 2004 on kafka-main-codfw - T363210

Change #1072485 merged by jenkins-bot:

[operations/deployment-charts@master] mw-page-content-change-enrich: fix kafka values.

https://gerrit.wikimedia.org/r/1072485

Change #1072472 merged by jenkins-bot:

[operations/deployment-charts@master] Replace kafka-main2004 with kafka-main2009

https://gerrit.wikimedia.org/r/1072472

Mentioned in SAL (#wikimedia-operations) [2024-09-13T06:54:52Z] <jayme> evacuating leadership for all partitions assigned to broker id 2005 on kafka-main-codfw - T363210

Icinga downtime and Alertmanager silence (ID=a5b4f45c-79cb-4b77-b41e-ea8a9d2a6286) set by jayme@cumin1002 for 1 day, 0:00:00 on 2 host(s) and their services with reason: Hardware refresh

kafka-main[2005,2010].codfw.wmnet

Change #1072662 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/puppet@production] kafka-main: Replace kafka-main2005 with kafka-main2010

https://gerrit.wikimedia.org/r/1072662

Change #1072663 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/deployment-charts@master] Replace kafka-main2005 with kafka-main2010

https://gerrit.wikimedia.org/r/1072663

Change #1072662 merged by JMeybohm:

[operations/puppet@production] kafka-main: Replace kafka-main2005 with kafka-main2010

https://gerrit.wikimedia.org/r/1072662

Change #1072663 merged by jenkins-bot:

[operations/deployment-charts@master] Replace kafka-main2005 with kafka-main2010

https://gerrit.wikimedia.org/r/1072663

Mentioned in SAL (#wikimedia-operations) [2024-09-13T09:14:57Z] <jayme> restoring leadership for all partitions assigned to broker id 2005 on kafka-main-codfw - T363210

Change #1071844 merged by Vgutierrez:

[operations/puppet@production] hiera: let purged use closest cluster on codfw, ulsfo and eqsin

https://gerrit.wikimedia.org/r/1071844

Change #1072720 had a related patch set uploaded (by Vgutierrez; author: Vgutierrez):

[operations/puppet@production] hiera: Switch purged@cp2037 back to main-codfw

https://gerrit.wikimedia.org/r/1072720

Change #1072720 merged by Vgutierrez:

[operations/puppet@production] hiera: Switch purged@cp2037 back to main-codfw

https://gerrit.wikimedia.org/r/1072720

Change #1072753 had a related patch set uploaded (by Vgutierrez; author: Vgutierrez):

[operations/puppet@production] hiera: switch purged@codfw,ulsfo,eqsin back to codfw kafka cluster

https://gerrit.wikimedia.org/r/1072753

Change #1072753 merged by Vgutierrez:

[operations/puppet@production] hiera: switch purged@codfw,ulsfo,eqsin back to codfw kafka cluster

https://gerrit.wikimedia.org/r/1072753