`has_triton`: Use the device interface for detecting Triton availability #139171

galexite · 2024-10-29T09:08:00Z

This PR replaces the has_triton() global method which was previously used for this task.

cc @H-Huang @awgu @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10 @jerryzh168 @voznesenskym @penguinwu @EikanWang @Guobing-Chen @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov @kwen2501 @c-p-i-o @yf225 @ColinPeppler @desertfire @rec

pytorch-bot · 2024-10-29T09:08:04Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/139171

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 New Failures, 2 Unrelated Failures

As of commit d4b2963 with merge base 8904ba6 ():

NEW FAILURES - The following jobs have failed:

pull / linux-focal-py3_9-clang9-xla / build (gh)
ninja: build stopped: subcommand failed
pull / linux-jammy-rocm-py3.10 / build (gh)
Final attempt failed. Child_process exited with error code 1

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

pull / linux-focal-cuda12.6-py3.10-gcc11-sm89 / test (default, 3, 5, lf.ephemeral.linux.g6.4xlarge.experimental.nvidia.gpu) (gh) (trunk failure)
dynamo/test_structured_trace.py::StructuredTraceTest::test_ddp_graphs

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

pull / linux-jammy-py3-clang12-executorch / test (executorch, 1, 1, lf.ephemeral.linux.2xlarge) (gh) (#144480)
Process completed with exit code 1.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

galexite · 2024-10-29T09:14:45Z

@pytorchbot label 'topic: not user facing'

galexite · 2024-11-01T11:52:55Z

@jansel sorry about that, looks like I messed up the typing, should be ready for another CI run now!

jansel · 2024-11-01T18:32:25Z

@pytorchbot merge

pytorchmergebot · 2024-11-01T18:34:09Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2024-11-01T18:55:28Z

Merge failed

Reason: 3 mandatory check(s) failed. The first few are:

Dig deeper by viewing the failures on hud

Details for Dev Infra team

Raised by workflow job

Failing merge rule: Core Maintainers

galexite · 2024-11-15T16:21:01Z

@jansel @tugsbayasgalan I've fixed the tests and added the HOPs from Inductor that appear when I call init_backends_registration (which then eventually imports torch._inductor.ir) in inductor_utils.py to the list of HOPs without op info here: https://github.com/pytorch/pytorch/pull/139171/files#diff-2a1edf2b2655350ac32e1d24b4cfaf5b65d90056c4fcb2f77600245e428d5131R78-R84. I hope this is okay.

galexite · 2024-11-26T08:14:08Z

Hey @jansel, could I trouble you for a re-review please? Thanks!

galexite · 2025-05-06T08:10:23Z

Rebased since #152529 was merged.

I have removed the changes to the Inductor scheduler checks, I think that is where the problem may lie, instead this only includes the has_triton component. I'll submit the Inductor scheduler checks as a separate PR.

I'm hoping this PR will now pass when workflows are rerun.

…changes

galexite · 2025-05-07T08:48:33Z

@pytorchbot merge

pytorchmergebot · 2025-05-07T08:51:12Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

…ity (#139171) Summary: This PR replaces the `has_triton()` global method which was previously used for this task. X-link: pytorch/pytorch#139171 Approved by: https://github.com/jansel, https://github.com/shink Reviewed By: huydhn Differential Revision: D74338720 fbshipit-source-id: 27106df937bbdea2da1f4911ffffcfae056f844d

masnesral · 2025-05-09T22:25:15Z

Looks like this causes a pretty big perf drop on some huggingface models. For example, on an H100 from 2.6x -> 1.3x for the following:
python benchmarks/dynamo/huggingface.py --performance --inference --bfloat16 --backend inductor --device cuda --cold-start-latency --only BlenderbotSmallForCausalLM

galexite · 2025-05-10T06:50:39Z

Hi @masnesral, if this PR is causing those performance drops, it might be because hand-written Triton kernels aren't being used, because has_triton returns False? What happens if you do the following in a Python REPL?

from torch.utils._triton import has_triton
print(f"{has_triton()=}")
print(f"{has_triton("cuda")=}")

from torch._dynamo.device_interface import get_interface_for_device
get_interface_for_device("cuda").raise_if_triton_unavailable()  # shouldn't throw if all is okay

Unfortunately, I don't have access to an H100, but this should work the same. These give True for me on a g5.8xlarge AWS instance and the last doesn't throw.

Also, to help with debugging this, could you also tell me what the result of these are for you?

import triton, torch
print(f'{"nvidia" in triton.backends.backends=}')
print(f'{torch.cuda.get_device_properties("cuda")=}')

galexite · 2025-05-10T07:08:14Z

The other thing is, I did change the if in Inductor which enables the pad_mm pass, to check if the tensor is on a device actively using the TritonScheduler, rather than using has_triton. Maybe that incorrectly causes an early exit of that Inductor pass?

masnesral · 2025-05-10T14:44:23Z

The perf drop is also pretty noticeable on A100. See for example:
https://hud.pytorch.org/benchmark/huggingface/inductor_with_cudagraphs?dashboard=torchinductor&startTime=Fri,%2002%20May%202025%2020:46:14%20GMT&stopTime=Fri,%2009%20May%202025%2020:46:14%20GMT&granularity=hour&mode=inference&dtype=bfloat16&deviceName=cuda%20(h100)&lBranch=main&lCommit=aca2c99a6528349bb67481310c56a9b08f39934b&rBranch=main&rCommit=5c878d4b04be45c86ab280d0d5e33ba072c9dcb3

masnesral · 2025-05-10T14:44:47Z

Sorry, gonna try to revert this while we investigate further.

masnesral · 2025-05-10T14:44:50Z

@pytorchbot revert -m="Performance regression for huggingface" -c=nosignal

pytorchmergebot · 2025-05-10T14:46:17Z

@pytorchbot successfully started a revert job. Check the current status here.
Questions? Feedback? Please reach out to the PyTorch DevX Team

…vailability (#139171)" This reverts commit 48bfe9a. Reverted #139171 on behalf of https://github.com/masnesral due to Performance regression for huggingface ([comment](#139171 (comment)))

pytorchmergebot · 2025-05-10T14:46:26Z

@galexite your PR has been successfully reverted.

This PR was reopened (likely due to being reverted), so your approval was removed. Please request another review.

masnesral · 2025-05-10T15:08:59Z

@galexite my having tested on H100 doesn't seem to be a related (see links to a100 above), but here's the output you requested anyway. I think it's what you were expecting :/

>>> import triton, torch
>>> print(f'{"nvidia" in triton.backends.backends=}')
"nvidia" in triton.backends.backends=True
>>> print(f'{torch.cuda.get_device_properties("cuda")=}')
torch.cuda.get_device_properties("cuda")=_CudaDeviceProperties(name='NVIDIA H100', major=9, minor=0, total_memory=97285MB, multi_processor_count=132, uuid=70fa10a6-2939-4471-959c-6da3b40decb6, pci_bus_id=6, pci_device_id=0, pci_domain_id=0, L2_cache_size=60MB)

>>> from torch.utils._triton import has_triton
>>> print(f"{has_triton()=}")
has_triton()=True
>>> print(f"{has_triton('cuda')=}")
has_triton('cuda')=True
>>> from torch._dynamo.device_interface import get_interface_for_device
>>> get_interface_for_device("cuda").raise_if_triton_unavailable()

galexite · 2025-05-10T15:20:53Z

Hmm, okay. I'll have a look!

…vailability (#139171)" Summary: This reverts commit 48bfe9afc70a98addd5aa738bf501c029e4a9285. Reverted pytorch/pytorch#139171 on behalf of https://github.com/masnesral due to Performance regression for huggingface ([comment](pytorch/pytorch#139171 (comment))) Reviewed By: huydhn Differential Revision: D74531472 fbshipit-source-id: 751398ae3c03cdd1d1d7c75a5088207a3a1784cb

galexite requested a review from zou3519 as a code owner October 29, 2024 09:08

pytorch-bot bot added module: dynamo module: inductor release notes: sparse release notes category labels Oct 29, 2024

pytorch-bot bot added the topic: not user facing topic category label Oct 29, 2024

pytorchbot added the open source label Oct 29, 2024

zou3519 removed their request for review October 29, 2024 20:52

cpuhrsch requested review from jansel, Chillee and eellison and removed request for Chillee October 31, 2024 22:16

cpuhrsch added triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module and removed release notes: sparse release notes category labels Oct 31, 2024

jansel previously approved these changes Oct 31, 2024

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Nov 1, 2024

pytorchmergebot added the merging label Nov 1, 2024

pytorchmergebot removed the merging label Nov 1, 2024

galexite requested a review from mruberry as a code owner November 4, 2024 13:21

galexite requested review from tugsbayasgalan and ydwu4 as code owners November 13, 2024 11:03

eellison removed their request for review November 20, 2024 17:09

pytorch-bot bot removed ciflow/inductor ciflow/xpu Run XPU CI tasks labels May 6, 2025

galexite changed the title ~~Use the device interface for detecting Triton availability~~ has_triton: Use the device interface for detecting Triton availability May 6, 2025

galexite added 3 commits May 6, 2025 08:12

Undo changes to _content_store.py, they are not necessary with these …

6459143

…changes

Undo changes to verify_dynamo.py

63f344f

Remove comment change from logging.py

d4b2963

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label May 7, 2025

pytorchmergebot added the merging label May 7, 2025

pytorchmergebot closed this in 48bfe9a May 7, 2025

pytorchmergebot removed the merging label May 7, 2025

pytorchmergebot reopened this May 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`has_triton`: Use the device interface for detecting Triton availability #139171

`has_triton`: Use the device interface for detecting Triton availability #139171

has_triton: Use the device interface for detecting Triton availability #139171

Are you sure you want to change the base?

has_triton: Use the device interface for detecting Triton availability #139171

Conversation

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/139171

❌ 2 New Failures, 2 Unrelated Failures

Merge started

Merge failed

Merge started

`has_triton`: Use the device interface for detecting Triton availability #139171

`has_triton`: Use the device interface for detecting Triton availability #139171