10000 Requesting to add new Hostnames onto Salt master. · Issue #162635 · flutter/flutter · GitHub
[go: up one dir, main page]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Requesting to add new Hostnames onto Salt master. #162635

Open
kentnguyen99 opened this issue Feb 3, 2025 · 28 comments
Open

Requesting to add new Hostnames onto Salt master. #162635

kentnguyen99 opened this issue Feb 3, 2025 · 28 comments
Assignees
Labels
team-infra Owned by Infrastructure team

Comments

@kentnguyen99
Copy link

Type of Request

bug

Infrastructure Environment

LUCI, Github, Cocoon scheduler, Autosubmit, etc...

What is happening?

Hi Brandon,

We are going to add new servers into Device Lab. Please add the flowing Hostnames below onto Salt master.

flutter-devicelab-mac-50.mtv.corp.google.com
flutter-devicelab-mac-51.mtv.corp.google.com
flutter-devicelab-mac-52.mtv.corp.google.com

flutter-devicelab-linux-74.mtv.corp.google.com
flutter-devicelab-linux-75.mtv.corp.google.com
flutter-devicelab-linux-76.mtv.corp.google.com

Thanks so much,
Kent

Steps to reproduce

Step 1:
Step 2:
..
Step n:

Expected results

I expect to see X when Y is finished.

@bdero
Copy link
Member
bdero commented Feb 4, 2025

I just added the mac-50 key, but it looks the other 5 minions are not trying to communicate with the salt master yet, so their keys are not known yet.

@kentnguyen99
Copy link
Author

Hi Brandon,

I'm getting errors below when run command "salt-call state.apply"

flutter@flutter-deviclab-mac-50 ~ % sudo salt-call state.apply flutter.code_signing
Password:
[ERROR ] Rendering exception occurred
Traceback (most recent call last):
File "/opt/salt/lib/python3.10/site-packages/salt/utils/templates.py", line 476, in render_jinja_tmpl
output = template.render(**decoded_context)
File "/opt/salt/lib/python3.10/site-packages/jinja2/environment.py", line 1301, in render
self.environment.handle_exception()
File "/opt/salt/lib/python3.10/site-packages/jinja2/environment.py", line 936, in handle_exception
raise rewrite_traceback_stack(source=source)
File "", line 19, in top-level template code
File "/opt/salt/lib/python3.10/site-packages/jinja2/filters.py", line 1007, in do_trim
return soft_str(value).strip(chars)
jinja2.exceptions.UndefinedError: 'salt.utils.context.NamespacedDictWrapper object' has no attribute 'flutter_password'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/opt/salt/lib/python3.10/site-packages/salt/utils/templates.py", line 218, in render_tmpl
output = render_str(tmplstr, context, tmplpath)
File "/opt/salt/lib/python3.10/site-packages/salt/utils/templates.py", line 482, in render_jinja_tmpl
raise SaltRenderError("Jinja variable {}{}".format(exc, out), line, tmplstr)
salt.exceptions.SaltRenderError: Jinja variable 'salt.utils.context.NamespacedDictWrapper object' has no attribute 'flutter_password'; line 19


[...]

Somehow the cwd parameter doesn't work, so it has to cd into the directory.

restore_keychain:
cmd.run:
- name: |
cd /opt/flutter/code_signing
LOGIN_PASSWORD={{ pillar['flutter_password']|trim }} ./nuke.sh <======================
security create-keychain -p {{ pillar['flutter_password']|trim }} login.keychain
security set-keychain-settings -l -u -t 1800 login.keychain
LOGIN_PASSWORD={{ pillar['flutter_password']|trim }} ./restore.sh
- runas: {{ swarming_user.user }}
- onchanges:
[...]

[CRITICAL] Rendering SLS 'base:flutter.code_signing' failed: Jinja variable 'salt.utils.context.NamespacedDictWrapper object' has no attribute 'flutter_password'; line 19


[...]

Somehow the cwd parameter doesn't work, so it has to cd into the directory.

restore_keychain:
cmd.run:
- name: |
cd /opt/flutter/code_signing
LOGIN_PASSWORD={{ pillar['flutter_password']|trim }} ./nuke.sh <======================
security create-keychain -p {{ pillar['flutter_password']|trim }} login.keychain
security set-keychain-settings -l -u -t 1800 login.keychain
LOGIN_PASSWORD={{ pillar['flutter_password']|trim }} ./restore.sh
- runas: {{ swarming_user.user }}
- onchanges:
[...]

local:
Data failed to compile:

Rendering SLS 'base:flutter.code_signing' failed: Jinja variable 'salt.utils.context.NamespacedDictWrapper object' has no attribute 'flutter_password'; line 19

[...]

Somehow the cwd parameter doesn't work, so it has to cd into the directory.

restore_keychain:
cmd.run:
- name: |
cd /opt/flutter/code_signing
LOGIN_PASSWORD={{ pillar['flutter_password']|trim }} ./nuke.sh <======================
security create-keychain -p {{ pillar['flutter_password']|trim }} login.keychain
security set-keychain-settings -l -u -t 1800 login.keychain
LOGIN_PASSWORD={{ pillar['flutter_password']|trim }} ./restore.sh
- runas: {{ swarming_user.user }}
- onchanges:
[...]

flutter@flutter-deviclab-mac-50 ~ %

@kentnguyen99
Copy link
Author

flutter@flutter-deviclab-mac-50 /opt % sudo salt-call state.apply
local:

      ID: states
Function: no.None
  Result: False
 Comment: No Top file or master_tops data matches found. Please see master log for details.
 Changes:   

Summary for local

Succeeded: 0
Failed: 1

@kentnguyen99
Copy link
Author

Hi Brandon,

Can you remove and then re-add again? I'm getting different errors this time.

flutter@flutter-deviclab-mac-50 /opt % sudo salt-call state.apply flutter.code_signing
[ERROR ] The Salt Master has cached the public key for this node, this salt minion will wait for 10 seconds before attempting to re-authenticate
Minion failed to authenticate with the master, has the minion key been accepted?

@bdero
Copy link
Member
bdero commented Feb 4, 2025

Can you remove and then re-add again? I'm getting different errors this time.

Done.

It sounds to me like the minion config got FUBAR'd for a moment there. 😅

Please let me know once the other 5 hosts have been bootstrapped. We'll be able to accept the other new keys then.

@kentnguyen99
Copy link
Author

Hi Brandon,
Can you add a key for flutter-devicelab-mac-51.mtv.corp.google.com to Salt master? I'm working on mac-51 now.
Thanks,

flutter@flutter-devicelab-mac-51 ~ % sudo salt-call state.apply flutter.code_signing
[ERROR ] The Salt Master has cached the public key for this node, this salt minion will wait for 10 seconds before attempting to re-authenticate
Minion failed to authenticate with the master, has the minion key been accepted?

@bdero
Copy link
Member
bdero commented Feb 4, 2025

Sure, mac-51 added.

@kentnguyen99
Copy link
Author

Hi Brandon,

I ran command "sudo salt-call state.apply" for 2nd time on mac-50 and it hung for 20mins and it won't go further. Can you take a look on flutter-devicelab-mac-50? Thanks.

flutter@flutter-devicelab-mac-50 ~ % sudo salt-call state.apply
/opt/salt/lib/python3.10/site-packages/salt/modules/x509.py:98: DeprecationWarning: The x509 modules are deprecated. Please migrate to the replacement modules (x509_v2). They are the default from Salt 3008 (Argon) onwards.
salt.utils.versions.warn_until(
/opt/salt/lib/python3.10/site-packages/salt/states/x509.py:211: DeprecationWarning: The x509 modules are deprecated. Please migrate to the replacement modules (x509_v2). They are the default from Salt 3008 (Argon) onwards.
salt.utils.versions.warn_until(

@kentnguyen99
Copy link
Author
kentnguyen99 commented Feb 4, 2025

Same problem and issue on mac-51 machine as well. "salt-call state.apply" command is hanging and won't go further..

flutter@flutter-devicelab-mac-51 ~ % sudo salt-call state.apply
Password:
/opt/salt/lib/python3.10/site-packages/salt/modules/x509.py:98: DeprecationWarning: The x509 modules are deprecated. Please migrate to the replacement modules (x509_v2). They are the default from Salt 3008 (Argon) onwards.
salt.utils.versions.warn_until(
/opt/salt/lib/python3.10/site-packages/salt/states/x509.py:211: DeprecationWarning: The x509 modules are deprecated. Please migrate to the replacement modules (x509_v2). They are the default from Salt 3008 (Argon) onwards.
salt.utils.versions.warn_until(

@bdero
Copy link
Member
bdero commented Feb 4, 2025

Definitely some weird stuff going on.

@christopherfujino Would mac-50 be our first fresh provision attempt after the switch to 3006.9?

mac-49 is currently running the 3002.9 minion and mac-45 is running 3004.2:

flutter-devicelab-mac-49 ~ % salt-minion --version
salt-minion 3002.9

flutter-devicelab-mac-45 ~ % salt-minion --version
salt-minion 3004.2

@bdero
Copy link
Member
bdero commented Feb 4, 2025

Minion isn't responding with the job feed, but I did find complaints about m2crypto missing in the logs:

2025-02-04 09:46:57,505 [salt.state       :323 ][ERROR   ][435] State 'x509.private_key_managed' was not found in SLS 'machine_tokend'
Reason: 'x509' __virtual__ returned False: Could not load x509 state: m2crypto unavailable

@bdero
Copy link
Member
bdero commented Feb 4, 2025

Perhaps this is the issue: flutter/cocoon#4002 (comment)
Maybe the job output would show us failing to build m2crypto, if I could fetch it. 😅

@kentnguyen99
Copy link
Author

Hi Brandon,

It shows "M2crypto is installed" when run command "salt-call --versions" on both mac-50 and mac-51 machine.

flutter@flutter-devicelab-mac-51 ~ % sudo salt-call --versions
Password:
Salt Version:
Salt: 3006.3

Python Version:
Python: 3.10.13 (main, Sep 6 2023, 02:16:00) [Clang 14.0.0 (clang-1400.0.29.202)]

Dependency Versions:
cffi: 1.14.6
cherrypy: 18.6.1
dateutil: 2.8.0
docker-py: Not Installed
gitdb: 4.0.5
gitpython: 3.1.32
Jinja2: 3.1.2
libgit2: Not Installed
looseversion: 1.0.2
M2Crypto: 0.38.0
Mako: Not Installed
msgpack: 1.0.2
msgpack-pure: Not Installed
mysql-python: Not Installed
packaging: 22.0
pycparser: 2.21
pycrypto: Not Installed
pycryptodome: 3.9.8
pygit2: Not Installed
python-gnupg: 0.4.8
PyYAML: 6.0.1
PyZMQ: 23.2.0
relenv: 0.13.10
smmap: 3.0.2
timelib: 0.2.4
Tornado: 4.5.3
ZMQ: 4.3.4

System Versions:
dist: darwin 24.1.0
locale: utf-8
machine: x86_64
release: 24.1.0
system: Darwin
version: 15.1 x86_64

@bdero
Copy link
Member
bdero commented Feb 4, 2025

Yeah the env already has m2crypto.

Requirement already satisfied: m2crypto in /opt/salt/lib/python3.10/site-packages (0.38.0)

I'm going to try killing the job and re-running while monitoring the states on mac-50.

@bdero
Copy link
Member
bdero commented Feb 5, 2025

It's hanging in homebrew.sls while running.

[INFO    ] Executing state cmd.run for [NONINTERACTIVE=1 /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install.sh)" 2>&1 | tee brew_installation_log.txt]

It's attempting to run this because the unless check fails (which runs brew info):

[DEBUG   ] stdout: /bin/bash: brew: command not found

However, brew is already installed.

The install log hangs after /usr/bin/sudo /bin/chmod u+rwx /usr/local/bin /usr/local/sbin:

% sudo cat /Users/swarming/brew_installation_log.txt
==> Running in non-interactive mode because `$NONINTERACTIVE` is set.
==> Checking for `sudo` access (which may request your password)...
==> This script will install:
/usr/local/bin/brew
/usr/local/share/doc/homebrew
/usr/local/share/man/man1/brew.1
/usr/local/share/zsh/site-functions/_brew
/usr/local/etc/bash_completion.d/brew
/usr/local/Homebrew
==> The following existing directories will be made group writable:
/usr/local/bin
/usr/local/sbin
==> The following existing directories will have their owner set to swarming:
/usr/local/bin
/usr/local/sbin
==> The following existing directories will have their group set to admin:
/usr/local/bin
/usr/local/sbin
==> /usr/bin/sudo /bin/chmod u+rwx /usr/local/bin /usr/local/sbin

I took a closer look at the state, and we're going down the false case for this grain conditional: {%- if grains['cpu_model'] == 'Apple M1' %}
The Apple M1 path adds homebrew stuff to the swarming user's .bash_profile.

mac-50 has an M4 Pro:

% sysctl -n machdep.cpu.brand_string
Apple M4 Pro

... and sure enough, the cpu_model grain is set to Apple M4 Pro

% sudo salt-call grains.get 'cpu_model'
Password:
local:
    Apple M4 Pro

I'm not sure if this is the only problem yet, but this check is too specific. It should be checking for arm64 architecture instead.

@kentnguyen99
Copy link
Author

@bdero

Yes. All new Mac Minis are coming with M4 Chip.
Can I run "salt-call state.apply" command now? or Do you need more times to fix the issue?
Thanks,.

@bdero
Copy link
Member
bdero commented Feb 5, 2025

Can I run "salt-call state.apply" command now? or Do you need more times to fix the issue?

I think I'll need to roll out a fix.

Found good examples to reference:

  • mac-45 has Apple M1 and is healthy & idling in the try pool.
  • mac-49 has Intel(R) Core(TM) i7-8700B CPU @ 3.20GHz
  • mac-50 has Apple M4 Pro

There are 10 places in the codebase where we're doing this special Apple M1 check.

osarch is useless and returns x86_64 for Apple silicon.
There aren't any other straightforward options, although it would probably be safe to check for gpus[0].vendor == 'apple'.

I think our best bet is:

{% if grains['cpu_model'].startswith('Apple') %}

@bdero
Copy link
Member
bdero commented Feb 5, 2025

Okay, I deployed a patch and mac-50 is no longer hanging:

-------------
Succeeded: 44 (changed=18)
Failed:     1
-------------
Total states run:     45
Total run time:   37.466 s

There is one state failing, however:

----------
          ID: disable_gatekeeper
    Function: cmd.run
        Name: # Disable gatekeeper to prevent failures when updating flutter packages.
spctl --master-disable

      Result: False
     Comment: Command "# Disable gatekeeper to prevent failures when updating flutter packages.
              spctl --master-disable
              " run
     Started: 17:01:00.221843
    Duration: 35.288 ms
     Changes:
              ----------
              pid:
                  19378
              retcode:
                  1
              stderr:
              stdout:
                  Globally disabling the assessment system needs to be confirmed in System Settings.

It looks like gatekeeper may need to be manually turned off on newly provisioned machines?

@bdero
Copy link
Member
bdero commented Feb 5, 2025

@kentnguyen99
Copy link
Author
kentnguyen99 commented Feb 5, 2025

Hi @bdero,

Can you remove and add he keys for mac-50, mac-51 and mac-52 on Salt master? I'm getting errors below on both mac-50, mac-51 and mac-52 when run "salt-call state.apply: command.

flutter@flutter-devicelab-mac-50 ~ % sudo salt-call state.apply
[ERROR ] The Salt Master has cached the public key for this node, this salt minion will wait for 10 seconds before attempting to re-authenticate
Minion failed to authenticate with the master, has the minion key been accepted?

flutter@flutter-devicelab-mac-51 ~ % sudo salt-call state.apply
[ERROR ] The Salt Master has cached the public key for this node, this salt minion will wait for 10 seconds before attempting to re-authenticate
Minion failed to authenticate with the master, has the minion key been accepted?

flutter@flutter-devicelab-mac-52 ~ % sudo salt-call state.apply flutter.code_signing
Password:
[ERROR ] The Salt Master has cached the public key for this node, this salt minion will wait for 10 seconds before attempting to re-authenticate
Minion failed to authenticate with the master, has the minion key been accepted?

@bdero
Copy link
Member
bdero commented Feb 5, 2025

I added keys for mac-50, mac-51, and mac-52.

Since this is the 3rd time for mac-50 and the second time for mac-51, maybe there's a config issue on the master causing the keys to get forgotten.

@kentnguyen99
Copy link
Author

@bdero, All good now. mac-50, mac-51 and mac-52 have been added to Salt master. However; all three machines got same failed error = "Gatekeeper" issue. I disabled assessment on all three machines but still getting 1 failed error when run command "salt-call state.apply".

flutter@flutter-devicelab-mac-50 ~ % sudo spctl --status
Password:
assessments disabled

ID: disable_gatekeeper
Function: cmd.run
Name: # Disable gatekeeper to prevent failures when updating flutter packages.
spctl --master-disable

  Result: False
 Comment: Command "# Disable gatekeeper to prevent failures when updating flutter packages.
          spctl --master-disable
          " run
 Started: 13:22:25.127820
Duration: 35.695 ms
 Changes: 

Summary for local

Succeeded: 44 (changed=12)
Failed: 1

@kentnguyen99
Copy link
Author

Hi Brandon,

Will you add/remove keys for mac-50, mac-51 and mac-52 from Salt master, I'm getting errors again below.
I don't know the reason these new Mac minis keeps out of sync with Salt master.
Thanks.

flutter@flutter-devicelab-mac-50 ~ % sudo salt-call state.apply
[ERROR ] The Salt Master has cached the public key for this node, this salt minion will wait for 10 seconds before attempting to re-authenticate
Minion failed to authenticate with the master, has the minion key been accepted?

flutter@flutter-devicelab-mac-51 ~ % sudo salt-call state.apply
[ERROR ] The Salt Master has cached the public key for this node, this salt minion will wait for 10 seconds before attempting to re-authenticate
Minion failed to authenticate with the master, has the minion key been accepted?

flutter@flutter-devicelab-mac-52 ~ % sudo salt-call state.apply
[ERROR ] The Salt Master has cached the public key for this node, this salt minion will wait for 10 seconds before attempting to re-authenticate
Minion failed to authenticate with the master, has the minion key been accepted?

@bdero
Copy link
Member
bdero commented Feb 7, 2025

I accepted the 3 new mac keys again. I'm currently looking into why they are getting periodically forgotten.

@bdero
Copy link
Member
bdero commented Feb 7, 2025

I added the new names to autosign_grains, so hopefully they won't keep dropping off from here.

@kentnguyen99
Copy link
Author

@bdero Hi Brando,

Will you take a look of "provision_salt.sh" script again. After provision_salt.sh script has been modified, it did not work on Mac mini OS any more. I ran this script today "./cocoon/dev/provision_salt dev" and got errors below.

flutter@flutter-devicelab-mac-50 Downloads % ./cocoon/dev/provision_salt.sh dev
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
100 56.5M 100 56.5M 0 0 53.4M 0 0:00:01 0:00:01 --:--:-- 96.8M
installer: Package name is Salt 3006.3 (Python 3)
installer: Upgrading at base path /
installer: The upgrade was successful.
master: salt-flutter.endpoints.fuchsia-infra.cloud.goog
id: flutter-devicelab-mac-50
autosign_grains:

  • fqdn
    cp: salt.minion.plist: No such file or directory

@bdero
Copy link
Member
bdero commented Feb 18, 2025

Hey @kentnguyen99, I think running the provision script from within the cocoon/dev/ directory should work.

% pwd
/Users/flutter/Downloads/cocoon/dev
% ./provision_salt.sh dev
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100 56.5M  100 56.5M    0     0  42.4M      0  0:00:01  0:00:01 --:--:-- 65.0M
installer: Package name is Salt 3006.3 (Python 3)
installer: Upgrading at base path /
installer: The upgrade was successful.
master: salt-flutter.endpoints.fuchsia-infra.cloud.goog
id: flutter-devicelab-mac-50
autosign_grains:
  - fqdn
salt-minion 3006.3 (Sulfur)
Succeed!

@kentnguyen99
Copy link
Author

Yes, I confirmed. It is working within the cocoon/dev/ directory, I will make a note of this. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
team-infra Owned by Infrastructure team
Projects
Development

No branches or pull requests

2 participants
0