-
Notifications
You must be signed in to change notification settings - Fork 24.3k
Pytorch nighlty and openAI/triton cuda #106144
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Can you please run |
I need to check if I can get again exactly that env.
The env was mainly: |
Hi all, I am seeing similar problems resulting in My environment was built as a fresh I get the following reason for the compilation failure:
This is observed for the following environment:
The script that reproduces this for my build is rathe 8000 r simple:
|
Not reproducible on V100s.
Output:
Env:
Will try on A100 next. |
Yes, good catch! It would be incompatible with the CUDA 12.x stack, but given we are installing the CUDA 11.8 nightly PyTorch wheels, I have assumed Triton uses CUDA11.x, too. Do you know if |
Also no repro on A100.
if I install the CUDA 11.8 PyTorch nightlies. |
Yes, triton always uses cuda-12 |
As we have discussed at the mentioned triton-lang/triton#1955 it seems to be hardcoded right? IMHO the main problem is more that currently the CI it was not going to covering this case with regular tests. |
@bhack yes, it is. But can one can (in theory) ask Trition to use different ptxas using It's somewhat hard to test something like that in CI, as runners are provisioned with the latest kernel driver in order be usable with both CUDA-12 and CUDA-11.8. Also, older driver is less stable, so we run into a multiple hangs/segfaults that were mitigated by installing newer driver. |
Ok but it seems that the CI here it is not testing this conf right? Also, is the TRITON CI still testing 11.x on the commit hash we have picked? |
As we are approaching to the release with #108055 can we re-label this one? |
Are we sure that we can deliver 11.x reliable wheels? |
Tried reproducing this on A100 with
using code from comment: #106144 (comment)
|
Can you recheck #106144 (comment) |
@atalman You need to rerun with nightly:
@malfet What do you think about this current Pytorch nightly (but also next stable 2.1.2) CUDA 11.x wheel status? |
We're working on a triton 2.1.0 conda package for defaults, FYI, which will have mlir and cudatoolkit unvendored. We'll try to use cudatoolkit 11.8 for this, although we haven't checked yet if triton's using any API entry points that were introduced with 12.x |
@danpetry for 2.1.0 it does not, but for 2.2.0 we need to cherry-pick the change to make CUDA-12 specific API call optional. pytorch/.github/scripts/build_triton_wheel.py Lines 88 to 92 in 3e47e3f
|
Ok, thanks. That looks like a pure python package recipe is generated, does your recipe compile triton from source? |
Not, it's not a pure python package. |
Can we cherry-pick or update the commit sha for triton-lang/triton#3053? Compiling on pytroch nightly is broken |
Uh oh!
There was an error while loading. Please reload this page.
🐛 Describe the bug
If I am not wrong I think that the nightly pytorch (Cuda 11.8) wheels are not compatible with the Triton pinned commit as I am seeing something like triton-lang/triton#1955
See more:
https://discuss.pytorch.org/t/any-change-of-using-cuda-12-2/184461/6
If it is true why the CI is not failing with tests on pytorch nighlty cuda 11.8?
Versions
nightly
cc @seemethere @malfet @osalpekar @atalman @ptrblck @ezyang @msaroufim @wconstab @bdhirsh @anijain2305 @zou3519 @gchanan
The text was updated successfully, but these errors were encountered: