8000 Refactor wheel upload job to a separate job running on GH ephemeral runner by huydhn · Pull Request #4877 · pytorch/test-infra · GitHub
[go: up one dir, main page]

Skip to content

Refactor wheel upload job to a separate job running on GH ephemeral runner #4877

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Jan 15, 2024

Conversation

huydhn
Copy link
Contributor
@huydhn huydhn commented Jan 13, 2024

To run the upload part in a separate upload job on GH ephemeral runners, we need:

  1. Specific artifact name for each binary, so the upload job could find the correct one.
  2. Create a new GHA setup-binary-upload to:
    1. Download the artifacts from GitHub
    2. Running pkg-helpers is needed to figure out the correct S3 bucket and path to upload to.
  3. Create a new GHA reusable workflow _binary_upload to upload the artifacts to S3.
    1. Run on GH ephemeral runner ubuntu-22.04.
    2. Only this job has access to the credential, the build job doesn' 8000 t have that privilege anymore.

A small caveat here is that the upload job will depend on the build job with all its configuration matrix, so it can only be run after all build configurations finish successfully, not when individual builds finish.

The PR is quite big, so I will do a similar follow up for conda build after this using the same _binary_upload reusable workflow.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 13, 2024
Copy link
vercel bot commented Jan 13, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment
Name Status Preview Comments Updated (UTC)
torchci ⬜️ Ignored (Inspect) Visit Preview Jan 13, 2024 8:57am

@huydhn huydhn changed the title Refactor binary upload job to a separate job running on GH ephemeral runner Refactor wheel upload job to a separate job running on GH ephemeral runner Jan 13, 2024
@huydhn huydhn requested review from atalman and malfet January 13, 2024 09:06
@huydhn huydhn marked this pull request as ready for review January 13, 2024 09:06
@huydhn huydhn merged commit 8acbaa9 into main Jan 15, 2024
huydhn added a commit that referenced this pull request Jan 15, 2024
I made a mistake in #4877 when
not explicitly select `test-infra` as the repo to checkout for the GHA.
When running on domains, the default repos are domains, i.e.
`pytorch/vision`, and they don't have the GHA we need, namely
`setup-binary-upload`.

An example failure when testing on vision nightly
https://github.com/pytorch/vision/actions/runs/7533535440/job/20506343331
malfet pushed a commit that referenced this pull request Jan 16, 2024
…unner (#4886)

Similar to #4877, this moves
conda upload into a separate job on GH ephemeral runner:

* I need a new `_binary_conda_upload` reusable workflow because conda
upload uses anaconda client to upload to conda, not awscli to upload to
S3.
* The build job doesn't have access to `pytorchbot-env` anymore, thus it
has no access to `CONDA_PYTORCHBOT_TOKEN` and
`CONDA_PYTORCHBOT_TOKEN_TEST` secrets. Only the upload job has this
access.
huydhn added a commit that referenced this pull request Feb 12, 2024
…unner (#4877)

To run the upload part in a separate upload job on GH ephemeral runners,
we need:

1. Specific artifact name for each binary, so the upload job could find
the correct one.
2. Create a new GHA `setup-binary-upload` to:
    1. Download the artifacts from GitHub 
2. Running `pkg-helpers` is needed to figure out the correct S3 bucket
and path to upload to.
3. Create a new GHA reusable workflow `_binary_upload` to upload the
artifacts to S3.
    1. Run on GH ephemeral runner `ubuntu-22.04`.
2. Only this job has access to the credential, the build job doesn't
have that privilege anymore.

A small caveat here is that the upload job will depend on the build job
with all its configuration matrix, so it can only be run after all build
configurations finish successfully, not when individual builds finish.

The PR is quite big, so I will do a similar follow up for conda build
after this using the same `_binary_upload` reusable workflow.
huydhn added a commit that referenced this pull request Feb 12, 2024
I made a mistake in #4877 when
not explicitly select `test-infra` as the repo to checkout for the GHA.
When running on domains, the default repos are domains, i.e.
`pytorch/vision`, and they don't have the GHA we need, namely
`setup-binary-upload`.

An example failure when testing on vision nightly
https://github.com/pytorch/vision/actions/runs/7533535440/job/20506343331
huydhn added a commit that referenced this pull request Feb 12, 2024
…unner (#4886)

Similar to #4877, this moves
conda upload into a separate job on GH ephemeral runner:

* I need a new `_binary_conda_upload` reusable workflow because conda
upload uses anaconda client to upload to conda, not awscli to upload to
S3.
* The build job doesn't have access to `pytorchbot-env` anymore, thus it
has no access to `CONDA_PYTORCHBOT_TOKEN` and
`CONDA_PYTORCHBOT_TOKEN_TEST` secrets. Only the upload job has this
access.
huydhn added a commit that referenced this pull request Feb 12, 2024
The list includes:

* #4870
* #4877
* #4882
* #4886
* #4891
* #4893
* #4894
* #4901

---------

Co-authored-by: Andrey Talman <atalman@fb.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0