8000 Undefined symbol: cuOccupancyMaxActiveClusters · Issue #115075 · pytorch/pytorch · GitHub
[go: up one dir, main page]

Skip to content

Undefined symbol: cuOccupancyMaxActiveClusters #115075

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
bhack opened this issue Dec 4, 2023 · 28 comments
Open

Undefined symbol: cuOccupancyMaxActiveClusters #115075

bhack opened this issue Dec 4, 2023 · 28 comments
Labels
oncall: pt2 triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@bhack
Copy link
Contributor
bhack commented Dec 4, 2023

🐛 Describe the bug

torch.compile is failing on nightly:
Starting docker FROM pytorch/pytorch:2.1.0-cuda11.8-cudnn8-devel

pip install --upgrade --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu118

torch._dynamo.exc.BackendCompilerFailed: backend='inductor' raised:
 ImportError: /tmp/torchinductor_root/triton/0/4fcaf02726b7cedfebb83bbf158e9de6/cuda_utils.so: undefined symbol: cuOccupancyMaxActiveClusters

Versions

torch-2.2.0.dev20231204%2Bcu118-cp310-cp310-linux_x86_64.whl

cc @ezyang @msaroufim @wconstab @bdhirsh @anijain2305 @zou3519

@bhack
Copy link
Contributor Author
bhack commented Dec 4, 2023

Are we sure can that we can still deliver cuda 11.x wheels?

triton-lang/triton#2062

@malfet malfet added needs reproduction Someone else needs to try reproducing the issue given the instructions. No action needed from user oncall: pt2 labels Dec 4, 2023
@malfet
Copy link
Contributor
malfet commented Dec 4, 2023

@bhack can you please run pytorch3 -mtorch.utils.collect_env and post results here (so I guess I know the answer: you have 450.xx driver or something)

@bhack
Copy link
Contributor Author
bhack commented Dec 4, 2023

We are forced on GKE autopilot to be in 470.x we have no other possible choices.

@bhack
Copy link
Contributor Author
bhack commented Dec 4, 2023

Also why is it missing undefined symbol: cuOccupancyMaxActiveClusters symbol? Is this available only in cuda 12.x?

@bhack
Copy link
Contributor Author
bhack commented Dec 4, 2023

in any case:

[pip3] numpy==1.26.0
[pip3] pytorch-triton==2.1.0+bcad9dabe1
[pip3] torch==2.2.0.dev20231204+cu118
[pip3] torchaudio==2.2.0.dev20231204+cu118
[pip3] torchelastic==0.2.2
[pip3] torchvision==0.17.0.dev20231204+cu118
[pip3] triton==2.1.0
[conda] blas                      1.0                         mkl  
[conda] ffmpeg                    4.3                  hf484d3e_0    pytorch
[conda] libjpeg-turbo             2.0.0                h9bf148f_0    pytorch
[conda] mkl                       2023.1.0         h213fc3f_46343  
[conda] mkl-service               2.4.0           py310h5eee18b_1  
[conda] mkl_fft                   1.3.8           py310h5eee18b_0  
[conda] mkl_random                1.2.4           py310hdb19cb5_0  
[conda] numpy                     1.26.0          py310h5f9d8c6_0  
[conda] numpy-base                1.26.0          py310hb5e798b_0  
[conda] pytorch-cuda              11.8                 h7e8668a_5    pytorch
[conda] pytorch-mutex             1.0                        cuda    pytorch
[conda] pytorch-triton            2.1.0+bcad9dabe1          pypi_0    pypi
[conda] torch                     2.2.0.dev20231204+cu118          pypi_0    pypi
[conda] torchaudio                2.2.0.dev20231204+cu118          pypi_0    pypi
[conda] torchelastic              0.2.2                    pypi_0    pypi
[conda] torchtriton               2.1.0                     py310    pytorch
[conda] torchvision               0.17.0.dev20231204+cu118          pypi_0    pypi

But it is redundant as I've mentioned that I've reproduced the error starting from the official Pytorch docker image + Nightly.

@atalman
Copy link
Contributor
atalman commented Dec 4, 2023

@bhack could you please post more details in repro. We run torch compile tests on nightly and it passes here:
https://github.com/pytorch/builder/actions/runs/7088836647/job/19292246062#step:11:17171

@bhack
Copy link
Contributor Author
bhack commented Dec 4, 2023

The test you have pointed is 12.1 right? We are taking about 11.8 wheels.

@atalman wrote: no there are multiple 11.8 wheel test if you look here: https://github.com/pytorch/builder/actions/runs/7088836647/job/19292255936

@malfet
Copy link
Contributor
malfet commented Dec 4, 2023

@bhack just to confirm, this sort of error is not the case with older triton from 2.1.x releases (where TRITON_PTX_PATH was a viable workaround?)

@malfet malfet removed the needs reproduction Someone else needs to try reproducing the issue given the instructions. No action needed from user label Dec 4, 2023
@bhack
Copy link
Contributor Author
bhack commented Dec 4, 2023

What do you mean?

@malfet
Copy link
Contributor
malfet commented Dec 4, 2023

If you try to use pytorch-2.1 on your system, torch.compile does not result in an undefined symbol error? (and one can manually replace ptxas embedded into triton with one from CUDA-11.X toolkit)

@bhack
Copy link
Contributor Author
bhack commented Dec 4, 2023

Yes it is specific about nightly but I have not bisected the commit

@malfet
Copy link
Contributor
malfet commented Dec 4, 2023

I'm almost certain it's 8a90249, which included triton-lang/triton#2638

@bhack
Copy link
Contributor Author
bhack commented Dec 4, 2023

I need to test nightly as I have a compiler bug to verify so I don't want to open a new ticket if it was solved with Pytorch nightly (or its updated Triton dependency)

@bhack
Copy link
Contributor Author
bhack commented Dec 5, 2023

@atalman wrote: no there are multiple 11.8 wheel test if you look here: https://github.com/pytorch/builder/actions/runs/7088836647/job/19292255936

I don't find the compiler test there.

@bhack 8000 : Here it is: https://github.com/pytorch/builder/actions/runs/7088836647/job/19292255936#step:11:16568

@bhack
Copy link
Contributor Author
bhack commented Dec 5, 2023

@atalman Just to be on same page what is the container image of that CI test run?

@bhack
Copy link
Contributor Author
bhack commented Dec 5, 2023

If I am reading your last link correctly we are running that test on a CPU image:
https://github.com/pytorch/builder/actions/runs/7088836647/job/19292255936#step:11:67

@shunting314 shunting314 added triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module and removed triage review labels Dec 5, 2023
@malfet malfet self-assigned this Dec 5, 2023
@malfet
Copy link
Contributor
malfet commented Dec 5, 2023

Let me try to propose a fix for the triton side

@bhack
Copy link
Contributor Author
bhack commented Dec 5, 2023

@malfet Are we not testing the compile on the GPU image in the CI?

@malfet
Copy link
Contributor
malfet commented Dec 5, 2023

@bhack torch.compile is tested by CI (not sure how it's relevant to this regression though)

@bhack
Copy link
Contributor Author
bhack commented Dec 5, 2023

I want to know if the CI is testing 11.x torch.compile in a GPU image/container.

@malfet
Copy link
Contributor
malfet commented Dec 6, 2023

I want to know if the CI is testing 11.x torch.compile in a GPU image/container.

Yes and no: torch.compile feature of cuda-11.8 builds are tested, but containers pass-thru NVIDIA driver, which is 535.54.03 for all the CI fleet, see here: https://github.com/pytorch/builder/actions/runs/7088836647/job/19292250293#step:11:16925

@bhack
Copy link
Contributor Author
bhack commented Dec 6, 2023

Yes thanks, I've looked at this few minutes ago and I always see nvidia-smi output with Cuda 12.x.
How we could reliably deliver11.x wheels if we don't have the right coverage in the CI?
Isn't better to just deliver 12.x wheels?
Also, we are not producing an official docker image for 11.x anymore and we are forcing Docker user to upgrade the driver with patch releases (not the best approach probably).
See what we have already discusses on 2.1.1 validation #112180 (comment).

@malfet
Copy link
Contributor
malfet commented Dec 8, 2023

This problem should be fixed with latest triton, should we update the pin again?

@bhack
Copy link
Contributor Author
bhack commented Dec 8, 2023

For me yes but it is risky to continue to deliver 11.x wheels with the CI (and probably Triton upstream CI?) using only 12.x drivers.
One of the main rationale to still use 11.x wheels is that you are in a system or in a cloud provider setup where you cannot upgrade the driver.

@akihironitta
Copy link
Contributor

Tried PyTorch 2.2 RC1 and have the same issue. It'd be great if it's cherry-picked into 2.2.

@bhack
Copy link
Contributor Author
bhack commented Dec 11, 2023

This problem should be fixed with latest triton, should we update the pin again?

@malfet Is it landing with #115529?

Edit: it will require a bit of work caused by the refactoring of triton.compile interface.

@masnesral
Copy link
Contributor

I'm trying to help bump old issues this week and not sure what to do with this one. Does anyone here know if there's still an issue here?

@malfet
Copy link
Contributor
malfet commented Jul 16, 2024

Looks like new symbol was added to Trition that needs similar workaround...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
oncall: pt2 triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

No branches or pull requests

6 participants
0