[go: up one dir, main page]

Page MenuHomePhabricator

Issues reimaging kubernetes workers due to user conflicts in systemd-timesyncd
Closed, ResolvedPublic

Description

It seems that hosts are having an issue installing systemd-timesyncd=247.3-7+deb11u6, possibly due to a difference in script behaviours between deb11u5 and deb11u6. During a puppet run post-imaging, the user systemd-timesync is being created with uid 999, which based on our adduser configuration is not a system user, and so the installation fails. I'm not certain that the package is at fault here, but the timing of this package release makes me suspicious - servers imaged last week with the old version did not have this issue.

For a host that received deb11u5:

hnowlan@wikikube-worker2050:~$ id systemd-timesync
uid=105(systemd-timesync) gid=111(systemd-timesync) groups=111(systemd-timesync)

For a new host trying to install deb11u6:

hnowlan@wikikube-worker2070:~$ id systemd-timesync
uid=999(systemd-timesync) gid=999(systemd-timesync) groups=999(systemd-timesync)

Sort term, this can be hacked around by running sudo userdel systemd-timesync && sudo run-puppet-agent

Package failing to install via puppet:

Notice: /Stage[main]/Systemd::Timesyncd/File[/etc/systemd/timesyncd.conf]: Dependency Package[systemd-timesyncd] has failures: true                                                        [37/997]
Warning: /Stage[main]/Systemd::Timesyncd/File[/etc/systemd/timesyncd.conf]: Skipping because of failed dependencies
Warning: /Stage[main]/Systemd::Timesyncd/Service[systemd-timesyncd]: Skipping because of failed dependencies
Error: Execution of '/usr/bin/apt-get -y -q remove --purge apt-listchanges' returned 100: Reading package lists...
Building dependency tree...                                                                      
Reading state information...                                                                                                                                                                       
The following package was automatically installed and is no longer required:
  python3-debconf                
Use 'sudo apt autoremove' to remove it.                                                          
The following packages will be REMOVED:                                                          
  apt-listchanges*                                                                               
su: warning: cannot change directory to /nonexistent: No such file or directory
ERROR:debmonitor:Failed to execute DebMonitor CLI: 'NoneType' object has no attribute 'source_name'  
0 upgraded, 0 newly installed, 1 to remove and 0 not upgraded.
1 not fully installed or removed.
After this operation, 0 B of additional disk space will be used.
Setting up systemd-timesyncd (247.3-7+deb11u6) ...                                                                                                                                                 adduser: The user `systemd-timesync' already exists, but is not a system user. Exiting.
dpkg: error processing package systemd-timesyncd (--configure):
 installed systemd-timesyncd package post-installation script subprocess returned error exit status 1
Errors were encountered while processing:                                                        
 systemd-timesyncd
E: Sub-process /usr/bin/dpkg returned an error code (1)

Event Timeline

hnowlan updated the task description. (Show Details)
hnowlan removed subscribers: ayounsi, cmooney.

I don't see anything obvious in the diff of those two packages.
The systems prior to yesterday seem to have installed systemd-timesyncd during d-i, whereas the new ones did not. There the user creation is the first thing logged in syslog (so right after d-i?).

hnowlan renamed this task from Issues reimaging kubernetes workers due to package user issues in systemd-timesyncd to Issues reimaging kubernetes workers due to user conflicts in systemd-timesyncd .Tue, Sep 3, 9:40 AM

I don't see anything obvious in the diff of those two packages.
The systems prior to yesterday seem to have installed systemd-timesyncd during d-i, whereas the new ones did not. There the user creation is the first thing logged in syslog (so right after d-i?).

Agreed that there's nothing in the diff so this isn't the package's fault most likely, and the uid range being used is what the package is configured to use. Good point on the user being created first - this happens well before we configure adduser.conf so the uid used is a valid system user up until we configure that file via puppet.

[...] - this happens well before we configure adduser.conf so the uid used is a valid system user up until we configure that file via puppet.

But that is also true for when the package is installed during d-i (where out adduser.conf is also not in place). But maybe it's a totally different story in case of d-i. I wonder why tasksel does no longer select the package. It seems to be the only one missing...

There was a point release yesterday which could explain why it changed cc @elukey

The new point release for Bullseye is https://www.debian.org/News/2024/2024083102, and indeed I updated the netinst image yesterday.

Could it be a matter of having an explicit ordering dependency between profile::adduser and profile::systemd::timesyncd in profile::base ? IIUC now the user is not installed anymore during d-i, and somehow there is a race between adduser.conf and systemd-timesyncd being created.

The new point release for Bullseye is https://www.debian.org/News/2024/2024083102, and indeed I updated the netinst image yesterday.

Could it be a matter of having an explicit ordering dependency between profile::adduser and profile::systemd::timesyncd in profile::base ? IIUC now the user is not installed anymore during d-i, and somehow there is a race between adduser.conf and systemd-timesyncd being created.

Just to be precise: The whole package (systemd-timesync) is not installed anymore during d-i, which is probably because of the new netinst image.

The new point release for Bullseye is https://www.debian.org/News/2024/2024083102, and indeed I updated the netinst image yesterday.

Could it be a matter of having an explicit ordering dependency between profile::adduser and profile::systemd::timesyncd in profile::base ? IIUC now the user is not installed anymore during d-i, and somehow there is a race between adduser.conf and systemd-timesyncd being created.

Just to be precise: The whole package (systemd-timesync) is not installed anymore during d-i, which is probably because of the new netinst image.

Sure, but if it is not d-i that installs it, then it is puppet (via systemd::timesyncd and the related profile included in profile::base) that does it right? If there is a race condition we can fix it in puppet, this is my point, but maybe I am not getting what is your idea/plan for the fix.

I think this might be a bug in the latest systemd update for LTS:

  • systemd-timesyncd deb11u4 and deb11u5 have Priority: standard, i.e. they get automatically installed by debootstrap
  • deb11u6 however as released by the latest LTS security update has Priority: optional, i.e. it doesn't get installed by debootstrap.

So installations which happened on Monday before https://lists.debian.org/debian-lts-announce/2024/09/msg00001.html went out, should have worked fine, but once deb11u6 was on security.debian.org the new Priority took effect and made it no longer being installed.

I'll reproduce this in an nspawn contained and report upstream.

Sure, but if it is not d-i that installs it, then it is puppet (via systemd::timesyncd and the related profile included in profile::base) that does it right? If there is a race condition we can fix it in puppet, this is my point, but maybe I am not getting what is your idea/plan for the fix.

For explicit Puppet conditional we need to consider two cases:

  • On Bullseye the systemd-timesyncd user is created by adduser, so we need to depend on File['/etc/adduser.conf'] (the puppetised adduser.conf)
  • On Bookworm the system user is created by systemd-sysuser, so we need to depend on Systemd::Sysuser['/etc/sysusers.d/sysusers-base-config.conf']

I'll reproduce this in an nspawn contained and report upstream.

This has been tracked down following my report on IRC: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1003949 changed the Priority for systemd-timesyncd for bullseye, but that override is missing for bullseye-security. This will need to be fixed by the Debian archive team (FTP masters).

MoritzMuehlenhoff claimed this task.

The override is now fixed on the Debian archive side and bullseye installations should work again. Please reopen if you still see reimages failing.