User Details
- User Since
- Apr 1 2015, 4:33 PM (492 w, 5 d)
- Availability
- Available
- LDAP User
- Moritz Mühlenhoff
- MediaWiki User
- MMuhlenhoff (WMF) [ Global Accounts ]
Today
The override is now fixed on the Debian archive side and bullseye installations should work again. Please reopen if you still see reimages failing.
Yesterday
mw2379 is also still in puppetboard: https://puppetboard.wikimedia.org/catalog/mw2379.codfw.wmnet
@Dzahn gerrit1004 is still in puppetdb: https://puppetboard.wikimedia.org/catalog/gerrit1004.wikimedia.org
Something went wrong with the 2430 rename, it's still showing up in Puppetboard: https://puppetboard.wikimedia.org/node/mw2430.codfw.wmnet
Fri, Sep 6
I think
I've uploaded a fixed bullseye build to apt.wikimedia.org and upgraded build2001 (the rest of Bullseye hosts is WIP), that unbreak the next docker-report run.
All done!
Thu, Sep 5
I've kicked off the RAID rebuild; it should complete in half an hour. I've also re-added puppetmaster1003 back to active duty.
@VRiley-WMF puppetmaster1003 has been taken out of active duty and I've set downtime, you can proceed with the drive swap any time.
Any host which switches from iptables/ferm to nftables strictly needs a reboot after the provider has been changed. Some of the kernel modules used by iptables cannot be unloaded at runtime without a reboot (I tried various -f hacks, but to no avail). If the old iptables kernel modules are still loaded the constants formerly defined by ferm still persist (and this is what we are seeing here: the hosts don't know about alert2002 being in the new global list of monitoring hosts).
Tue, Sep 3
Nice! I suppose the disk swap needs downtime? Then I'll take the server out of rotation Thursday morning (I'm off tomorrow)
Tracking bug is https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1080418
I think this might be a bug in the latest systemd update for LTS:
deb11u5 is from the point release, deb11u6 is from https://lists.debian.org/debian-lts-announce/2024/09/msg00001.html (released yesterday)
This got superceded by https://phabricator.wikimedia.org/T328331
The tool has been fixed by https://gerrit.wikimedia.org/r/c/operations/puppet/+/761029, the manager information is now correctly displayed.
This was intended as a workaround for VMs running on Ganeti servers with 1G memory and Java-based workloads which have a lot of memory activity. We started to buy 10G NICs for all server refreshes and at this point (and when the next refresh is done), the old systems should be mostly gone. As such, this is no longer needed.
Mon, Sep 2
I'll look into a fix
That is already tracked as T348876, I'll merge that in.
Jun 28 2024
Let's directly install this server with Puppet 7, there should be no issues in the deployment-server manifests in terms of Puppet 5/7 compat at this point.
Jun 27 2024
Anf FYI, https://gerrit.wikimedia.org/g/operations/software/bitu-ldap is a wrapper for simplifying LDAP operations within Wikimedia (originally written for Bitu, but other Python also use it). Should be helpful for writing the dump script.
Using ldap-maint1001 has the benefit that it already does r/w changes to the r/w slapd servers. Currently we don't restrict that, but we've been gradually shifting r/o access only to the replicas and I'd like to come to a state where the only r/w changes to our LDAP are coming from Horizon (for cloud VPS access management), ldap-maint and Bitu and then all other hosts in production get access denied via firewall rules.
One thing that we could do is to
I'll take care of this when I'm back from sabbatical
The old nodes have been decommissioned, all done.
Why bullseye, this should be bookworm? docker-registry is packaged in Debian, so we can simply use bookworm and use the package from it. In fact, we are already using the bookworm package on the existing registry hosts (2.8.2+ds1-1)
Jun 26 2024
Jun 25 2024
Indeed, CGO_ENABLED=0 rings a bell.
The dependency is added because some feature in the compiled Go code uses syscalls which were only wired up in 2.34 (maybe openat() at al). We ran into this problem before and there was a Go build flag to force it to use a fallback. I can't find a reference currently, but maybe Filippo remembers when he's back.
The Buster instances have been removed.
Jun 24 2024
Given that this is a Go static ELF we can also simply build on bookworm and copy over the deb to bullseye-wikimedia, we're doing this for other exporters as well. buster might be tricky due to it's old libc6, but we can also ignore it, there's less than 150 hosts left and they can simply live the old IPMI monitoring.
Very nice!
CAS 7.0 (what we are currently migrating to) removed the memcached backend. As such, this change won't be needed anymore for the idp servers, I'll tick them off.
Jun 20 2024
Did one of these changes possinbly break PCC here?
https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler-test/3739/console
And prior to the migration, puppetserver1001 needs to be allowed in profile::tcpircbot
The task can be closed, or is there anything still open?