[go: up one dir, main page]

Page MenuHomePhabricator

Cannot activate/re-activate SSH keys in BITU
Closed, ResolvedPublicBUG REPORT

Description

Steps to replicate the issue (include links if applicable):

  • Logged in to https://idm.wikimedia.org/ using my LDAP account
  • Navigate to "SSH keys", then click "Upload new key"
  • Paste new key into the "SSH public key" textfield, also provide something in the "comment" field; keep default value "Wikimedia Cloud Services SSH keys" in the "system" dropdown; click "Upload new SSH key"
  • System returns to SSH key list, get green success message "SSH key successfully uploaded." at top. New key is listed as "Active: Nein" (for some reason it uses German "Nein"=No here)
  • Try to activate the key with the "Activate" button, and then again the "activate" button. Back on the SSH key list with a green success message "SSH key has been activated.", but the key is still not activated.

What happens?:
The new SSH key is not being activated and I cannot use it.

What should have happened instead?:

  • New SSH key should have been activated, as the success message has reported.
  • If any errors had occurred, they should have been reported to me. I have not seen any error messages.

Software version (on Special:Version page; skip for WMF-hosted wikis like Wikipedia):
N/A

Other information (browser name/version, screenshots, etc.):

  • I have tried this with a new rsa-4096 key as well as with a new ed25519 key. Same behavior, and no indication that something is wrong with the keys.
  • I have also suspended another older key that I have used for a couple of years — it is now still listed, but I cannot re-activate it again although this should be possible as much as I am aware.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
SLyngshede-WMF triaged this task as High priority.

First look indicates that the issue might be a missing comment in the comment field. I'm currently trying to reproduce how that may happen.
Second issue is how messaging is handled, we're missing a check for a "form valid"before displaying the "success" message.

Change #1038778 had a related patch set uploaded (by Slyngshede; author: Slyngshede):

[operations/software/bitu@master] Attempt to fix bug where SSH keys are imported incorrectly.

https://gerrit.wikimedia.org/r/1038778

@MisterSynergy I'm still trying to replicate the exact issue, but I believe we've found at least part of the issue.

If you are still stuck, in regards to managing SSH keys, you can until the bug is fixed use WikiTech or https://toolsadmin.wikimedia.org/ which both provides alternative interfaces to adding SSH keys for Wikimedia Cloud Services.

Change #1038778 merged by jenkins-bot:

[operations/software/bitu@master] Fix bug where SSH keys are imported incorrectly.

https://gerrit.wikimedia.org/r/1038778

@MisterSynergy We have deployed a potential bug fix to https://idm-test.wikimedia.org. This installation do use the production LDAP server, but a separate database. This mean that you should be able to test SSH key upload, activation and deactivation, but inactive keys may be different from those shown in production, as these only exist in the database for each of the two installation.

If you have the time, and feel comfortable doing so, please test if you SSH keys look and behave correctly in https://idm-test.wikimedia.org. If everything checks out we'll schedule a new release for next week.

I have tried https://idm-test.wikimedia.org/, result is as follows:

  • When uploading a new key, I get an error message that reads "500 Internal Server Error - Your request caused an error on the server." Nevertheless, the key is uploaded and listed in the tool, but not activated.
  • When trying to activate the key, I get the same error message "500 Internal Server Error - Your request caused an error on the server.", and the key keeps being not activated.
  • I can successfully delete the key.

I tried the entire procedure twice, and the behavior seems reproducible.

Change #1046613 had a related patch set uploaded (by Slyngshede; author: Slyngshede):

[operations/software/bitu@master] SSH Key mgmt: Ensure that keys are trimmed

https://gerrit.wikimedia.org/r/1046613

Change #1046613 merged by jenkins-bot:

[operations/software/bitu@master] SSH Key mgmt: Ensure that keys are trimmed

https://gerrit.wikimedia.org/r/1046613

@MisterSynergy Thank you for testing. Based on the error logs I believe that we where able to reproduce the bug you found.

A patch has been rolled out to idm-test.wikimedia.org, hopefully that should fix your issue. If you have the time, could you please re-test?

Sorry for the delay, this somehow almost got lost. Anyways, it is still not working for me.

Initially, I got a 403 error when trying to upload a key. It complained that I had deactivated http referers (which I actually had in the browser config). After allowing http referers, this error message went away. Seen for the first time in this tool, although I have deactivated http referers for a long time.

After temporarily activating http referers, I uploaded a key and got an error message "500 Internal Server Error - Your request caused an error on the server.". The key was there, though, but I was not able to activate it with the same error message "500 Internal Server Error - Your request caused an error on the server.". Deleting keys works withour errors, though.

From the log file we do see:

Forbidden (Referer checking failed - no Referer.): /keymanagement/create/

Which triggers an exception, that should be easy enough to replicate.

The second error makes a little less sense, but it's clearly in the update_ssh_key/syncronize method:

File "/usr/lib/python3/dist-packages/django/dispatch/dispatcher.py", line 181, in <listcomp>
  (receiver, receiver(signal=self, sender=sender, **named))
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/srv/idm/bitu/src/bitu/keymanagement/signals.py", line 27, in update_ssh_key
  raise e
File "/srv/idm/bitu/src/bitu/keymanagement/signals.py", line 20, in update_ssh_key
  import_string(f'{instance.system}.helpers.syncronize_ssh_keys')(instance.user)
File "/srv/idm/bitu/src/bitu/ldapbackend/helpers.py", line 109, in syncronize_ssh_keys
  jobs.syncronize_ssh_keys(user)
File "/srv/idm/bitu/src/bitu/ldapbackend/jobs.py", line 158, in syncronize_ssh_keys
  load_ssh_key(user)
File "/srv/idm/bitu/src/bitu/ldapbackend/jobs.py", line 172, in load_ssh_key
  key.save()
File "/srv/idm/bitu/src/bitu/keymanagement/models.py", line 39, in save
  super().save(*args, **kwargs)
File "/usr/lib/python3/dist-packages/django/db/models/base.py", line 739, in save
  self.save_base(using=using, force_insert=force_insert,
File "/usr/lib/python3/dist-packages/django/db/models/base.py", line 787, in save_base
  post_save.send(
File "/usr/lib/python3/dist-packages/django/dispatch/dispatcher.py", line 180, in send
  return [

In the end we get a "RecursionError: maximum recursion depth exceeded", part of it seems to be a failure to decode the LDAP update or query.

The good news is that I saw the same issue while debugging previously, but apparently was mistaken in thinking I had create a test case covering this.

I shall return :-)

Thank you so much for testing and reporting back, it really does help.

Change #1051293 had a related patch set uploaded (by Slyngshede; author: Slyngshede):

[operations/software/bitu@master] LDAP key sync: Improvements to SSH key sync with LDAP.

https://gerrit.wikimedia.org/r/1051293

Change #1051293 merged by jenkins-bot:

[operations/software/bitu@master] LDAP key sync: Improvements to SSH key sync with LDAP.

https://gerrit.wikimedia.org/r/1051293

@MisterSynergy We deployed an update to https://idm-test.wikimedia.org

If you have the time, please test if this fixes your problem. What we've done is re-work the key comparison between the IDM and LDAP to use fingerprinting of the keys.

I suspect what's happening is that you have a key that is "malformed", not in the sense that it's not valid, but the IDM parses it work and then fail to do the correct comparison. Relying on fingerprinting instead should alleviate that issue.

Part of the problem you've been seeing is also due to how signals are/where handled internally. In some cases it was possible to end up in a loop. Again correct key comparison should have stopped that, but clearly didn't. Now there is a controllable way for the application so stop the signal recursion. E.g. when a key is imported from LDAP the IDM won't attempt to pushed it back to LDAP.

Believe to be resolved. Fix has been deployed to production.