-
Notifications
You must be signed in to change notification settings - Fork 24.2k
Separate arm64 and amd64 docker builds #125617
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/125617
Note: Links to docs will display an error until the docs builds have been completed. ❌ 3 New Failures, 4 Unrelated FailuresAs of commit fd3c207 with merge base bb668c6 ( NEW FAILURES - The following jobs have failed:
FLAKY - The following jobs failed but were likely due to flakiness present on trunk:
BROKEN TRUNK - The following job failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
.github/workflows/docker-release.yml
Outdated
docker push ghcr.io/pytorch/pytorch-nightly:"${PYTORCH_NIGHTLY_COMMIT}${CUDA_SUFFIX}" | ||
|
||
# Please note, here we ned to pin specific verison of CUDA as with latest label | ||
if[[${CUDA_VERSION_SHORT} == "12.1"]]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need latest label at all here ? Maybe we can simply remove it. I see following stats:
https://github.com/orgs/pytorch/packages/container/pytorch-nightly/212425390?tag=latest
Download activity
Total downloads
0
Last 30 days
0
Last week
0
Today
0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Plz fix lint before landing
@pytorchmergebot merge -f "Failures are unrelated" |
Merge startedYour change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
As a followup after: pytorch/pytorch#125617 Add validation_runner output param to know what validation runner to use Test: ``` python tools/scripts/generate_docker_release_matrix.py {"include": [{"cuda": "11.8", "cuda_full_version": "11.8.0", "cudnn_version": "8", "image_type": "runtime", "docker": "ghcr.io/pytorch/pytorch-nightly:2.4.0.dev20240507-cuda11.8-cudnn8-runtime", "platform": "linux/amd64", "validation_runner": "linux.g5.4xlarge.nvidia.gpu"}, {"cuda": "11.8", "cuda_full_version": "11.8.0", "cudnn_version": "8", "image_type": "devel", "docker": "ghcr.io/pytorch/pytorch-nightly:2.4.0.dev20240507-cuda11.8-cudnn8-devel", "platform": "linux/amd64", "validation_runner": "linux.g5.4xlarge.nvidia.gpu"}, {"cuda": "12.1", "cuda_full_version": "12.1.1", "cudnn_version": "8", "image_type": "runtime", "docker": "ghcr.io/pytorch/pytorch-nightly:2.4.0.dev20240507-cuda12.1-cudnn8-runtime", "platform": "linux/amd64", "validation_runner": "linux.g5.4xlarge.nvidia.gpu"}, {"cuda": "12.1", "cuda_full_version": "12.1.1", "cudnn_version": "8", "image_type": "devel", "docker": "ghcr.io/pytorch/pytorch-nightly:2.4.0.dev20240507-cuda12.1-cudnn8-devel", "platform": "linux/amd64", "validation_runner": "linux.g5.4xlarge.nvidia.gpu"}, {"cuda": "12.4", "cuda_full_version": "12.4.0", "cudnn_version": "8", "image_type": "runtime", "docker": "ghcr.io/pytorch/pytorch-nightly:2.4.0.dev20240507-cuda12.4-cudnn8-runtime", "platform": "linux/amd64", "validation_runner": "linux.g5.4xlarge.nvidia.gpu"}, {"cuda": "12.4", "cuda_full_version": "12.4.0", "cudnn_version": "8", "image_type": "devel", "docker": "ghcr.io/pytorch/pytorch-nightly:2.4.0.dev20240507-cuda12.4-cudnn8-devel", "platform": "linux/amd64", "validation_runner": "linux.g5.4xlarge.nvidia.gpu"}, {"cuda": "cpu", "cuda_full_version": "", "cudnn_version": "", "image_type": "runtime", "docker": "ghcr.io/pytorch/pytorch-nightly:2.4.0.dev20240507-runtime", "platform": "linux/arm64", "validation_runner": "linux.arm64.2xlarge"}]} ```
@pytorchbot cherry-pick --onto release/2.3 -c critical |
Fixes #125094 Please note: Docker CUDa 12.4 failure is existing issue, related to docker image not being available on gitlab: ``` docker.io/nvidia/cuda:12.4.0-cudnn8-devel-ubuntu22.04: docker.io/nvidia/cuda:12.4.0-cudnn8-devel-ubuntu22.04: not found ``` https://github.com/pytorch/pytorch/actions/runs/8974959068/job/24648540236?pr=125617 Here is the reference issue: https://gitlab.com/nvidia/container-images/cuda/-/issues/225 Tracked on our side: pytorch/builder#1811 Pull Request resolved: #125617 Approved by: https://github.com/huydhn, https://github.com/malfet (cherry picked from commit b29d77b)
Cherry picking #125617The cherry pick PR is at #126099 and it is recommended to link a critical cherry pick PR with an issue Details for Dev Infra teamRaised by workflow job |
Separate arm64 and amd64 docker builds (#125617) Fixes #125094 Please note: Docker CUDa 12.4 failure is existing issue, related to docker image not being available on gitlab: ``` docker.io/nvidia/cuda:12.4.0-cudnn8-devel-ubuntu22.04: docker.io/nvidia/cuda:12.4.0-cudnn8-devel-ubuntu22.04: not found ``` https://github.com/pytorch/pytorch/actions/runs/8974959068/job/24648540236?pr=125617 Here is the reference issue: https://gitlab.com/nvidia/container-images/cuda/-/issues/225 Tracked on our side: pytorch/builder#1811 Pull Request resolved: #125617 Approved by: https://github.com/huydhn, https://github.com/malfet (cherry picked from commit b29d77b) Co-authored-by: atalman <atalman@fb.com>
Fixes #125094
Please note: Docker CUDa 12.4 failure is existing issue, related to docker image not being available on gitlab:
https://github.com/pytorch/pytorch/actions/runs/8974959068/job/24648540236?pr=125617
Here is the reference issue: https://gitlab.com/nvidia/container-images/cuda/-/issues/225
Tracked on our side: pytorch/builder#1811