`test_scatter_bf16_cuda` fails on V100 #118581

malfet · 2024-01-29T21:49:51Z

🐛 Describe the bug

While running inductor CI on V100, I've found that above-mentioned test fails with unsupported PTX instructions:

% python3 inductor/test_torchinductor.py -v -k test_scatter_bf16_cuda
...
RuntimeError: Internal Triton PTX codegen error: 
ptxas /tmp/compile-ptx-src-f5ac42, line 48; error   : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-f5ac42, line 48; error   : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-f5ac42, line 52; error   : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-f5ac42, line 52; error   : Feature 'cvt.bf16.f32' requires .target sm_80 or higher
ptxas fatal   : Ptx assembly aborted due to errors

Versions

2.2, nightly

cc @ezyang @gchanan @zou3519 @kadeng @msaroufim @bdhirsh @anijain2305 @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @yf225 @chenyang78 @muchulee8 @aakhundov @ColinPeppler

The text was updated successfully, but these errors were encountered:

tringwald · 2024-01-30T08:02:13Z

Probably related to this #118122.

eellison · 2024-03-12T17:46:14Z

@malfet, still working on this ?

anijain2305 · 2024-03-19T18:38:23Z

Comments from triage meeting

V100 has limited support for bfloat16. Since Triton uses more advanced intrinsics, should we fallback?
Conclusion - fallback only for a few ops - like scatter.

masnesral · 2024-04-23T18:27:14Z

We're doing the weekly check-in on hi-pri issues that haven't been updated in a month. @malfet, any update?

malfet · 2024-06-11T16:37:44Z

This sounds like a duplicate of #118122

Voltas do not have a HW support for bfloat16 datatype, but this type is emulated in software, so PyTorch eager can use bfloat16 tensors, but not Triton So if graph with either CUDA bf16 input or output tensors is used, raise warning and skip the frame Fixes #118122 and #118581

Volta(sm_7x) do not have a HW support for bfloat16 datatype, and while it is is emulated to ted in software, so PyTorch eager can use bfloat16 tensors, but not in Triton. So if graph with either CUDA bf16 input or output tensors is used, raise warnings and skip the frame. Add optional parameter `including_emulation` to `torch.cuda.is_bf16_supported` method and call it from `torch._inductor.compile_fx. _check_triton_bf16_support`. Test plan: Modify `is_bf16_supported` to return False and see that warning is generated Fixes #118122 and #118581 Pull Request resolved: #129288 Approved by: https://github.com/eqy, https://github.com/jansel

malfet · 2024-06-25T13:41:17Z

Closing, were fixed by skipping compile on Volta's for bf16 dtype

Volta(sm_7x) do not have a HW support for bfloat16 datatype, and while it is is emulated to ted in software, so PyTorch eager can use bfloat16 tensors, but not in Triton. So if graph with either CUDA bf16 input or output tensors is used, raise warnings and skip the frame. Add optional parameter `including_emulation` to `torch.cuda.is_bf16_supported` method and call it from `torch._inductor.compile_fx. _check_triton_bf16_support`. Test plan: Modify `is_bf16_supported` to return False and see that warning is generated Fixes #118122 and #118581 Pull Request resolved: #129288 Approved by: https://github.com/eqy, https://github.com/jansel (cherry picked from commit 14dc08d)

Inductor to fail gracefully on Voltas for bf16 tensors (#129288) Volta(sm_7x) do not have a HW support for bfloat16 datatype, and while it is is emulated to ted in software, so PyTorch eager can use bfloat16 tensors, but not in Triton. So if graph with either CUDA bf16 input or output tensors is used, raise warnings and skip the frame. Add optional parameter `including_emulation` to `torch.cuda.is_bf16_supported` method and call it from `torch._inductor.compile_fx. _check_triton_bf16_support`. Test plan: Modify `is_bf16_supported` to return False and see that warning is generated Fixes #118122 and #118581 Pull Request resolved: #129288 Approved by: https://github.com/eqy, https://github.com/jansel (cherry picked from commit 14dc08d) Co-authored-by: Nikita Shulga <nshulga@meta.com>

atalman · 2024-07-19T18:25:14Z

Validated in Colab:

2.4.0+cu121 _CudaDeviceProperties(name='Tesla T4', major=7, minor=5, total_memory=15102MB, multi_processor_count=40) True
/usr/local/lib/python3.10/dist-packages/torch/_inductor/compile_fx.py:1607: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
  warnings.warn(
tensor([[1., 0., 0., 1., 1.],
        [0., 1., 0., 0., 0.],
        [0., 0., 1., 0., 0.]], device='cuda:0', dtype=torch.bfloat16)

steveepreston · 2024-12-28T21:27:38Z

UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping

same error

malfet added the oncall: pt2 label Jan 29, 2024

mlazos added module: inductor high priority labels Jan 30, 2024

pytorch-bot bot added the triage review label Jan 30, 2024

mlazos removed the triage review label Jan 30, 2024

mlazos assigned malfet Jan 30, 2024

eellison added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Jan 30, 2024

zou3519 added this to the 2.4.0 milestone Jun 4, 2024

malfet mentioned this issue Jun 22, 2024

Inductor to fail gracefully on Voltas for bf16 tensors #129288

Closed

malfet closed this as completed Jun 25, 2024

pytorchbot mentioned this issue Jun 27, 2024

Inductor to fail gracefully on Voltas for bf16 tensors #129699

Merged

atalman mentioned this issue Jul 5, 2024

Release 2.4.0 validations checklist and cherry-picks #130151

Closed

29 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

`test_scatter_bf16_cuda` fails on V100 #118581

`test_scatter_bf16_cuda` fails on V100 #118581

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

test_scatter_bf16_cuda fails on V100 #118581

test_scatter_bf16_cuda fails on V100 #118581

Comments

Uh oh!

🐛 Describe the bug

Versions

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

`test_scatter_bf16_cuda` fails on V100 #118581

`test_scatter_bf16_cuda` fails on V100 #118581