Undefined symbol: cuOccupancyMaxActiveClusters #115075

bhack · 2023-12-04T15:05:41Z

🐛 Describe the bug

torch.compile is failing on nightly:
Starting docker FROM pytorch/pytorch:2.1.0-cuda11.8-cudnn8-devel

pip install --upgrade --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu118

torch._dynamo.exc.BackendCompilerFailed: backend='inductor' raised:
 ImportError: /tmp/torchinductor_root/triton/0/4fcaf02726b7cedfebb83bbf158e9de6/cuda_utils.so: undefined symbol: cuOccupancyMaxActiveClusters

Versions

torch-2.2.0.dev20231204%2Bcu118-cp310-cp310-linux_x86_64.whl

cc @ezyang @msaroufim @wconstab @bdhirsh @anijain2305 @zou3519

The text was updated successfully, but these errors were encountered:

bhack · 2023-12-04T17:39:32Z

Are we sure can that we can still deliver cuda 11.x wheels?

triton-lang/triton#2062

malfet · 2023-12-04T17:54:23Z

@bhack can you please run pytorch3 -mtorch.utils.collect_env and post results here (so I guess I know the answer: you have 450.xx driver or something)

bhack · 2023-12-04T18:23:24Z

We are forced on GKE autopilot to be in 470.x we have no other possible choices.

bhack · 2023-12-04T18:33:26Z

Also why is it missing undefined symbol: cuOccupancyMaxActiveClusters symbol? Is this available only in cuda 12.x?

bhack · 2023-12-04T18:51:32Z

in any case:

[pip3] numpy==1.26.0
[pip3] pytorch-triton==2.1.0+bcad9dabe1
[pip3] torch==2.2.0.dev20231204+cu118
[pip3] torchaudio==2.2.0.dev20231204+cu118
[pip3] torchelastic==0.2.2
[pip3] torchvision==0.17.0.dev20231204+cu118
[pip3] triton==2.1.0
[conda] blas                      1.0                         mkl  
[conda] ffmpeg                    4.3                  hf484d3e_0    pytorch
[conda] libjpeg-turbo             2.0.0                h9bf148f_0    pytorch
[conda] mkl                       2023.1.0         h213fc3f_46343  
[conda] mkl-service               2.4.0           py310h5eee18b_1  
[conda] mkl_fft                   1.3.8           py310h5eee18b_0  
[conda] mkl_random                1.2.4           py310hdb19cb5_0  
[conda] numpy                     1.26.0          py310h5f9d8c6_0  
[conda] numpy-base                1.26.0          py310hb5e798b_0  
[conda] pytorch-cuda              11.8                 h7e8668a_5    pytorch
[conda] pytorch-mutex             1.0                        cuda    pytorch
[conda] pytorch-triton            2.1.0+bcad9dabe1          pypi_0    pypi
[conda] torch                     2.2.0.dev20231204+cu118          pypi_0    pypi
[conda] torchaudio                2.2.0.dev20231204+cu118          pypi_0    pypi
[conda] torchelastic              0.2.2                    pypi_0    pypi
[conda] torchtriton               2.1.0                     py310    pytorch
[conda] torchvision               0.17.0.dev20231204+cu118          pypi_0    pypi

But it is redundant as I've mentioned that I've reproduced the error starting from the official Pytorch docker image + Nightly.

atalman · 2023-12-04T20:52:45Z

@bhack could you please post more details in repro. We run torch compile tests on nightly and it passes here:
https://github.com/pytorch/builder/actions/runs/7088836647/job/19292246062#step:11:17171

bhack · 2023-12-04T21:05:13Z

The test you have pointed is 12.1 right? We are taking about 11.8 wheels.

@atalman wrote: no there are multiple 11.8 wheel test if you look here: https://github.com/pytorch/builder/actions/runs/7088836647/job/19292255936

malfet · 2023-12-04T21:38:45Z

@bhack just to confirm, this sort of error is not the case with older triton from 2.1.x releases (where TRITON_PTX_PATH was a viable workaround?)

bhack · 2023-12-04T21:40:44Z

What do you mean?

malfet · 2023-12-04T21:51:54Z

If you try to use pytorch-2.1 on your system, torch.compile does not result in an undefined symbol error? (and one can manually replace ptxas embedded into triton with one from CUDA-11.X toolkit)

bhack · 2023-12-04T21:58:54Z

Yes it is specific about nightly but I have not bisected the commit

malfet · 2023-12-04T22:10:31Z

I'm almost certain it's 8a90249, which included triton-lang/triton#2638

bhack · 2023-12-04T22:36:25Z

I need to test nightly as I have a compiler bug to verify so I don't want to open a new ticket if it was solved with Pytorch nightly (or its updated Triton dependency)

bhack · 2023-12-05T00:15:58Z

@atalman wrote: no there are multiple 11.8 wheel test if you look here: https://github.com/pytorch/builder/actions/runs/7088836647/job/19292255936

I don't find the compiler test there.

@bhack 8000 : Here it is: https://github.com/pytorch/builder/actions/runs/7088836647/job/19292255936#step:11:16568

bhack · 2023-12-05T11:21:12Z

@atalman Just to be on same page what is the container image of that CI test run?

bhack · 2023-12-05T11:25:19Z

If I am reading your last link correctly we are running that test on a CPU image:
https://github.com/pytorch/builder/actions/runs/7088836647/job/19292255936#step:11:67

malfet · 2023-12-05T20:38:24Z

Let me try to propose a fix for the triton side

bhack · 2023-12-05T20:59:28Z

@malfet Are we not testing the compile on the GPU image in the CI?

malfet · 2023-12-05T21:06:08Z

@bhack torch.compile is tested by CI (not sure how it's relevant to this regression though)

bhack · 2023-12-05T21:20:46Z

I want to know if the CI is testing 11.x torch.compile in a GPU image/container.

malfet · 2023-12-06T00:46:23Z

I want to know if the CI is testing 11.x torch.compile in a GPU image/container.

Yes and no: torch.compile feature of cuda-11.8 builds are tested, but containers pass-thru NVIDIA driver, which is 535.54.03 for all the CI fleet, see here: https://github.com/pytorch/builder/actions/runs/7088836647/job/19292250293#step:11:16925

bhack · 2023-12-06T01:07:51Z

Yes thanks, I've looked at this few minutes ago and I always see nvidia-smi output with Cuda 12.x.
How we could reliably deliver11.x wheels if we don't have the right coverage in the CI?
Isn't better to just deliver 12.x wheels?
Also, we are not producing an official docker image for 11.x anymore and we are forcing Docker user to upgrade the driver with patch releases (not the best approach probably).
See what we have already discusses on 2.1.1 validation #112180 (comment).

malfet · 2023-12-08T05:00:38Z

This problem should be fixed with latest triton, should we update the pin again?

bhack · 2023-12-08T20:46:50Z

For me yes but it is risky to continue to deliver 11.x wheels with the CI (and probably Triton upstream CI?) using only 12.x drivers.
One of the main rationale to still use 11.x wheels is that you are in a system or in a cloud provider setup where you cannot upgrade the driver.

akihironitta · 2023-12-10T02:36:46Z

Tried PyTorch 2.2 RC1 and have the same issue. It'd be great if it's cherry-picked into 2.2.

bhack · 2023-12-11T16:52:22Z

This problem should be fixed with latest triton, should we update the pin again?

@malfet Is it landing with #115529?

Edit: it will require a bit of work caused by the refactoring of triton.compile interface.

masnesral · 2024-07-10T02:17:38Z

I'm trying to help bump old issues this week and not sure what to do with this one. Does anyone here know if there's still an issue here?

malfet · 2024-07-16T00:01:35Z

Looks like new symbol was added to Trition that needs similar workaround...

This was referenced Dec 4, 2023

Pytorch nighlty and openAI/triton cuda #106144

Closed

[v2.1.2] Release Tracker #113962

Closed

malfet added needs reproduction Someone else needs to try reproducing the issue given the instructions. No action needed from user oncall: pt2 labels Dec 4, 2023

malfet added the triage review label Dec 4, 2023

malfet removed the needs reproduction Someone else needs to try reproducing the issue given the instructions. No action needed from user label Dec 4, 2023

shunting314 added triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module and removed triage review labels Dec 5, 2023

malfet self-assigned this Dec 5, 2023

Jokeren mentioned this issue Dec 7, 2023

[RUNTIME] Implement dynamic loading with defineGetFunctionHandle for CUDA version compatibility triton-lang/triton#2771

Merged

malfet removed their assignment Dec 8, 2023

ptrblck mentioned this issue Jul 16, 2024

Pytorch 2.4 RC cu118 wheels do not work on old drivers #130684

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Undefined symbol: cuOccupancyMaxActiveClusters #115075

Undefined symbol: cuOccupancyMaxActiveClusters #115075

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Undefined symbol: cuOccupancyMaxActiveClusters #115075

Undefined symbol: cuOccupancyMaxActiveClusters #115075

Comments

Uh oh!

🐛 Describe the bug

Versions

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!