-
Notifications
You must be signed in to change notification settings - Fork 24.3k
Undefined symbol: cuOccupancyMaxActiveClusters #115075
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Are we sure can that we can still deliver cuda 11.x wheels? |
@bhack can you please run |
We are forced on GKE autopilot to be in |
Also why is it missing |
in any case:
But it is redundant as I've mentioned that I've reproduced the error starting from the official Pytorch docker image + Nightly. |
@bhack could you please post more details in repro. We run torch compile tests on nightly and it passes here: |
The test you have pointed is 12.1 right? We are taking about 11.8 wheels. @atalman wrote: no there are multiple 11.8 wheel test if you look here: https://github.com/pytorch/builder/actions/runs/7088836647/job/19292255936 |
@bhack just to confirm, this sort of error is not the case with older triton from 2.1.x releases (where |
What do you mean? |
If you try to use pytorch-2.1 on your system, |
Yes it is specific about nightly but I have not bisected the commit |
I'm almost certain it's 8a90249, which included triton-lang/triton#2638 |
I need to test nightly as I have a compiler bug to verify so I don't want to open a new ticket if it was solved with Pytorch nightly (or its updated Triton dependency) |
I don't find the compiler test there. @bhack 8000 : Here it is: https://github.com/pytorch/builder/actions/runs/7088836647/job/19292255936#step:11:16568 |
@atalman Just to be on same page what is the container image of that CI test run? |
If I am reading your last link correctly we are running that test on a CPU image: |
Let me try to propose a fix for the triton side |
@malfet Are we not testing the compile on the GPU image in the CI? |
@bhack |
I want to know if the CI is testing 11.x |
Yes and no: torch.compile feature of cuda-11.8 builds are tested, but containers pass-thru NVIDIA driver, which is 535.54.03 for all the CI fleet, see here: https://github.com/pytorch/builder/actions/runs/7088836647/job/19292250293#step:11:16925 |
Yes thanks, I've looked at this few minutes ago and I always see nvidia-smi output with Cuda 12.x. |
This problem should be fixed with latest triton, should we update the pin again? |
For me yes but it is risky to continue to deliver 11.x wheels with the CI (and probably Triton upstream CI?) using only 12.x drivers. |
Tried PyTorch 2.2 RC1 and have the same issue. It'd be great if it's cherry-picked into 2.2. |
I'm trying to help bump old issues this week and not sure what to do with this one. Does anyone here know if there's still an issue here? |
Looks like new symbol was added to Trition that needs similar workaround... |
Uh oh!
There was an error while loading. Please reload this page.
🐛 Describe the bug
torch.compile
is failing on nightly:Starting docker
FROM pytorch/pytorch:2.1.0-cuda11.8-cudnn8-devel
pip install --upgrade --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu118
Versions
torch-2.2.0.dev20231204%2Bcu118-cp310-cp310-linux_x86_64.whl
cc @ezyang @msaroufim @wconstab @bdhirsh @anijain2305 @zou3519
The text was updated successfully, but these errors were encountered: