10000 [ROCm] Enable mi300-specific workflows to be triggered on PRs by jithunnair-amd · Pull Request #147904 · pytorch/pytorch · GitHub
[go: up one dir, main page]

Skip to content

[ROCm] Enable mi300-specific workflows to be triggered on PRs #147904

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 6 commits into from

Conversation

jithunnair-amd
Copy link
Collaborator
@jithunnair-amd jithunnair-amd commented Feb 25, 2025

This change will be needed to be able to trigger the MI300-specific CI workflows on PRs by using a PR label.

  • inductor-rocm-mi300.yml uses the existing ciflow/inductor-rocm label so that any PR manually labeled as such will trigger inductor config runs on both MI200 and MI300.
  • rocm-mi300.yml uses a separate ciflow/rocm-mi300 label, since we don't want to over-trigger default config runs on MI300 runners due to limited capacity, and ciflow/rocm label is automatically applied on many PRs.
  • inductor-perf-test-nightly-rocm.yml uses a separate ciflow/inductor-perf-test-nightly-rocm label, so that we can manually trigger a round of perf testing on MI300 runners to test the perf impact of a major inductor-related change.

cc @jeffdaily @sunway513 @pruthvistony @ROCmSupport @dllehr-amd @jataylo @hongxiayang @naromero77amd

@jithunnair-amd jithunnair-amd requested a review from a team as a code owner February 25, 2025 23:47
Copy link
pytorch-bot bot commented Feb 25, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/147904

Note: Links to docs will display an error until the docs builds have been completed.

⏳ No Failures, 31 Pending

As of commit e3b5b93 with merge base e4c558b (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added ciflow/rocm Trigger "default" config CI on ROCm module: rocm AMD GPU support for Pytorch topic: not user facing topic category labels Feb 25, 2025
Copy link
Contributor
@janeyx99 janeyx99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why shouldn't these just trigger as a subset of the ciflow/inductor-rocm labels?

@janeyx99 janeyx99 added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Mar 3, 2025
@jithunnair-amd jithunnair-amd force-pushed the enable_mi300_workflows_on_PRs branch from 98069fd to fb820c9 Compare March 4, 2025 20:49
@jithunnair-amd jithunnair-amd added the ciflow/rocm-mi300 Trigger "default" config CI on ROCm MI300 label Mar 4, 2025
Copy link
pytorch-bot bot commented Mar 4, 2025

Warning: Unknown label ciflow/rocm-mi300.
Currently recognized labels are

  • ciflow/binaries
  • ciflow/binaries_libtorch
  • ciflow/binaries_wheel
  • ciflow/inductor
  • ciflow/inductor-periodic
  • ciflow/inductor-rocm
  • ciflow/inductor-perf-compare
  • ciflow/inductor-micro-benchmark
  • ciflow/inductor-micro-benchmark-cpu-x86
  • ciflow/inductor-cu126
  • ciflow/linux-aarch64
  • ciflow/mps
  • ciflow/nightly
  • ciflow/periodic
  • ciflow/rocm
  • ciflow/s390
  • ciflow/slow
  • ciflow/trunk
  • ciflow/unstable
  • ciflow/xpu
  • ciflow/torchbench
  • ciflow/autoformat

Please add the new label to .github/pytorch-probot.yml

@jithunnair-amd jithunnair-amd requested a review from janeyx99 March 4, 2025 21:17
@janeyx99 janeyx99 removed their request for review March 4, 2025 21:40
@janeyx99
Copy link
Contributor
janeyx99 commented Mar 4, 2025

Ah thanks for updating the description. I think this is agreeable, but will let @huydhn sign off as an owner of the infra

Copy link
Contributor
@huydhn huydhn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! A note that you can also run the nightly ROCm performance benchmark via https://github.com/pytorch/pytorch/actions/workflows/inductor-perf-test-nightly-rocm.yml but only folks with write access to PyTorch can use it. Using ciflow is of course more convenient. But if capacity becomes an issue, we could remove ciflow/inductor-perf-test-nightly-rocm to restrict who could run the benchmark (that's what is done in H100 benchmark)

@jithunnair-amd
Copy link
Collaborator Author

LGTM! A note that you can also run the nightly ROCm performance benchmark via https://github.com/pytorch/pytorch/actions/workflows/inductor-perf-test-nightly-rocm.yml but only folks with write access to PyTorch can use it. Using ciflow is of course more convenient. But if capacity becomes an issue, we could remove ciflow/inductor-perf-test-nightly-rocm to restrict who could run the benchmark (that's what is done in H100 benchmark)

@huydhn Yes, I realize that the ciflow label method opens it up to being runnable by many more devs, but since we don't have permissions to trigger workflow_dispatch via the GitHub UI, I thought the ciflow label method was the next best solution.

@jithunnair-amd jithunnair-amd added the ciflow/inductor-perf-test-nightly-rocm Trigger inductor perf tests on ROCm label Mar 5, 2025
Copy link
pytorch-bot bot commented Mar 5, 2025

Warning: Unknown label ciflow/inductor-perf-test-nightly-rocm.
Currently recognized labels are

  • ciflow/binaries
  • ciflow/binaries_libtorch
  • ciflow/binaries_wheel
  • ciflow/inductor
  • ciflow/inductor-periodic
  • ciflow/inductor-rocm
  • ciflow/inductor-perf-compare
  • ciflow/inductor-micro-benchmark
  • ciflow/inductor-micro-benchmark-cpu-x86
  • ciflow/inductor-cu126
  • ciflow/linux-aarch64
  • ciflow/mps
  • ciflow/nightly
  • ciflow/periodic
  • ciflow/rocm
  • ciflow/s390
  • ciflow/slow
  • ciflow/trunk
  • ciflow/unstable
  • ciflow/xpu
  • ciflow/torchbench
  • ciflow/autoformat

Please add the new label to .github/pytorch-probot.yml

@jithunnair-amd jithunnair-amd added the ciflow/inductor-rocm Trigger "inductor" config CI on ROCm label Mar 5, 2025
@jithunnair-amd
Copy link
Collaborator Author

Successfully triggered the workflows using labels:

@jithunnair-amd
Copy link
Collaborator Author

@pytorchbot merge -f "Lint jobs passed. Successfully triggered relevant jobs, completing jobs is not required for this PR"

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@jithunnair-amd jithunnair-amd deleted the enable_mi300_workflows_on_PRs branch March 5, 2025 13:21
jithunnair-amd added a commit that referenced this pull request Mar 13, 2025
This change will be needed to be able to trigger the MI300-specific CI workflows on PRs by using a PR label.

* inductor-rocm-mi300.yml uses the existing `ciflow/inductor-rocm` label so that any PR manually labeled as such will trigger `inductor` config runs on both MI200 and MI300.
* rocm-mi300.yml uses a separate `ciflow/rocm-mi300` label, since we don't want to over-trigger `default` config runs on MI300 runners due to limited capacity, and [`ciflow/rocm` label is automatically applied](https://github.com/pytorch/test-infra/blob/79438512a0632583899938d3b0277da78f5569e0/torchci/lib/bot/autoLabelBot.ts#L24) on many PRs.
* inductor-perf-test-nightly-rocm.yml uses a separate `ciflow/inductor-perf-test-nightly-rocm` label, so that we can manually trigger a round of perf testing on MI300 runners to test the perf impact of a major inductor-related change.

Pull Request resolved: #147904
Approved by: https://github.com/huydhn
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/inductor-perf-test-nightly-rocm Trigger inductor perf tests on ROCm ciflow/inductor-rocm Trigger "inductor" config CI on ROCm ciflow/rocm Trigger "default" config CI on ROCm ciflow/rocm-mi300 Trigger "default" config CI on ROCm MI300 Merged module: rocm AMD GPU support for Pytorch open source topic: not user facing topic category triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants
0