[xpu] set aot device flags in cpp_extension #149459

jingxu10 · 2025-03-18T22:49:16Z

If PyTorch is compiled with only AOT text strings starting with "dg2", the _get_sycl_arch_list() function will pass an empty string to -device argument of ocloc and then cause a compilation crash.

pytorch-bot · 2025-03-18T22:49:19Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/149459

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❌ 4 New Failures

As of commit 78de53a with merge base d7f3cd0 ():

NEW FAILURES - The following jobs have failed:

xpu / linux-jammy-xpu-2025.0-py3.9 / test (default, 1, 4, linux.idc.xpu) (gh)
inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_pow_by_natural_log2_dynamic_shapes_dynamic_shapes_xpu
xpu / linux-jammy-xpu-2025.0-py3.9 / test (default, 2, 4, linux.idc.xpu) (gh)
export/test_cpp_serdes.py::CppSerdesTestExport::test_device_to_gpu_cpp_serdes
xpu / linux-jammy-xpu-2025.0-py3.9 / test (default, 3, 4, linux.idc.xpu) (gh)
export/test_retraceability.py::RetraceExportTestExport::test_device_to_gpu_retraceability_strict
xpu / linux-jammy-xpu-2025.0-py3.9 / test (default, 4, 4, linux.idc.xpu) (gh)
inductor/test_torchinductor.py::GPUTests::test_pow_by_natural_log2_dynamic_shapes_xpu

This comment was automatically generated by Dr. CI and updates every 15 minutes.

jingxu10 · 2025-03-18T23:09:21Z

@pytorchbot label "topic: not user facing"

torch/utils/cpp_extension.py

guangyey

Others LGTM.

guangyey · 2025-03-19T07:35:51Z

torch/utils/cpp_extension.py

@@ -284,25 +284,26 @@ def _join_sycl_home(*paths) -> str:

 _COMMON_SYCL_FLAGS = [
    '-fsycl',
-    '-fsycl-targets=spir64_gen,spir64',


-fsycl-targets=spir64_gen,spir64 is a kernel flag as well I remember.

Refer to https://github.com/intel/llvm/blob/sycl/sycl/doc/UsersManual.md, it is OK to remove this from common flags.

Normally, '-fsycl-targets' is specified when linking an application, in which case the AOT compiled device binaries are embedded within the application’s fat executable. However, this option may also be used in combination with '-c' and '-fno-sycl-rdc' when compiling a source file. In this case, the AOT compiled device binaries are embedded within the fat object file.

guangyey · 2025-03-19T07:39:21Z

torch/utils/cpp_extension.py

+        return ['-fsycl-targets=spir64_gen,spir64',
+                f'-Xs "-device {",".join(arch_list)}"']


Suggested change

return ['-fsycl-targets=spir64_gen,spir64',

f'-Xs "-device {",".join(arch_list)}"']

return ['-fsycl-targets=spir64_gen,spir64',

'-flink-huge-device-code',

f'-Xs "-device {",".join(arch_list)}"']

'-flink-huge-device-code' is only a link flag, right?

guangyey · 2025-03-19T07:40:22Z

torch/utils/cpp_extension.py

-def _get_sycl_arch_list():
-    if 'TORCH_XPU_ARCH_LIST' in os.environ:
-        return os.environ.get('TORCH_XPU_ARCH_LIST')
+def _get_sycl_arch_flag():


Suggested change

def _get_sycl_arch_flag():

def _get_sycl_arch_flags():

sorry.

jingxu10 · 2025-03-19T08:10:17Z

torch/utils/cpp_extension.py

 ]
+_SYCL_DLINK_FLAGS += _get_sycl_arch_flag()


@guangyey would you suggest a commit to change this flag to flags as well?

torch/utils/cpp_extension.py

dvrogozh · 2025-03-19T15:05:31Z

torch/utils/cpp_extension.py

    arch_list = torch.xpu.get_arch_list()
    # Dropping dg2* archs since they lack hardware support for fp64 and require
    # special consideration from the user. If needed these platforms can
    # be requested thru TORCH_XPU_ARCH_LIST environment variable.
    arch_list = [x for x in arch_list if not x.startswith('dg2')]
-    return ','.join(arch_list)
+    if len(arch_list) == 0:


This check here is the only relevant change to the declared purpose of the PR. All other changes such as adjustment of _COMMON_SYCL_FLAGS, _SYCL_DLINK_FLAGS and adding more flags to _get_sycl_arch_flags is out of the declared scope of the PR per its title and its description. @jingxu10 : can you, please, adjust them? In particular, I am looking for the description change which would clarify why are you making all these other changes?

When setting an empty string as the aot target we don't want to perform aot compilation with ocloc. In this case, neither -fsycl-targets=spir64_gen,spir64 nor -device should be set. Otherwise, ocloc will crash during compilation. This is the reason why we move the setting of -fsycl-targets=spir64_gen,spir64 and -device into the condition when we would like to run AOT compilation with targets.

dvrogozh · 2025-03-19T15:26:13Z

torch/utils/cpp_extension.py

    arch_list = torch.xpu.get_arch_list()
    # Dropping dg2* archs since they lack hardware support for fp64 and require
    # special consideration from the user. If needed these platforms can
    # be requested thru TORCH_XPU_ARCH_LIST environment variable.
    arch_list = [x for x in arch_list if not x.startswith('dg2')]
-    return ','.join(arch_list)
+    if len(arch_list) == 0:
+        return []


What the behavior of built extension will be if we return empty [] arch flags list?

Will it actually be buildable?

For which arch(s) it will be built?

Or it won't be pre-built for any AOT target and running extension will result in runtime compilation?

Secondly, is '-fsycl-targets=spir64_gen,spir64' still needed to be passed here?

I think empty arch list worths a comment left in the source code here.

Compilation will still work, but without aot compilation with ocloc. As you mentioned in3, it won't be pre-built for any AOT target and running extension will result in runtime compilation.
-fsycl-targets=spir64_gen,spir64 cannot be here, otherwise ocloc will crash complaining no targets are set.

dvrogozh · 2025-03-19T19:18:52Z

torch/utils/cpp_extension.py

+    if len(arch_list) == 0:
+        return []
+    else:
+        return ['-fsycl-targets=spir64_gen,spir64',


Moving these flags here does not seem to actually work correctly. With this change, the following 2 warnings appear. I suggest you better drop these flags from the PR and do that separately. if needed

# python -m pytest test/test_cpp_extensions_jit.py -k xpu ... test/test_cpp_extensions_jit.py [1/4] c++ -MMD -MF main.o.d -DTORCH_EXTENSION_NAME=inline_jit_extension_xpu -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1018\" -isystem /home/dvrogozh/git/pytorch/pytorch/torch/include -isystem /home/dvrogozh/git/pytorch/pytorch/torch/include/torch/csrc/api/include -isystem /usr/include/python3.12 -D_GLIBCXX_USE_CXX11_ABI=1 -fPIC -std=c++17 -c /home/dvrogozh/.cache/torch_extensions/py312_cpu/inline_jit_extension_xpu/main.cpp -o main.o [2/4] icpx -DTORCH_EXTENSION_NAME=inline_jit_extension_xpu -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1018\" -isystem /home/dvrogozh/git/pytorch/pytorch/torch/include -isystem /home/dvrogozh/git/pytorch/pytorch/torch/include/torch/csrc/api/include -isystem /usr/include/python3.12 -D_GLIBCXX_USE_CXX11_ABI=1 -fPIC -std=c++17 -fsycl -sycl-std=2020 -fsycl-host-compiler=c++ '-fsycl-host-compiler-options=-DTORCH_EXTENSION_NAME=inline_jit_extension_xpu -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\\"_gcc\\" -DPYBIND11_STDLIB=\\"_libstdcpp\\" -DPYBIND11_BUILD_ABI=\\"_cxxabi1018\\" -isystem /home/dvrogozh/git/pytorch/pytorch/torch/include -isystem /home/dvrogozh/git/pytorch/pytorch/torch/include/torch/csrc/api/include -isystem /usr/include/python3.12 -D_GLIBCXX_USE_CXX11_ABI=1 -fPIC -std=c++17' -c -x c++ /home/dvrogozh/.cache/torch_extensions/py312_cpu/inline_jit_extension_xpu/sycl.sycl -o sycl.sycl.o [3/4] icpx main.o sycl.sycl.o -o sycl_dlink.o -fsycl -fsycl-link --offload-compress -fsycl-targets=spir64_gen,spir64 -flink-huge-device-code -Xs "-device pvc" icpx: warning: linked binaries do not contain expected 'spir64_gen-unknown-unknown' target; found targets: 'spir64-unknown-unknown' [-Wsycl-target] icpx: warning: argument unused during compilation: '-flink-huge-device-code' [-Wunused-command-line-argument]

Do these 2 warning messages appear with empty aot or non-empty aot?

These 2 warning messages don't seem to make sense to me. If the arch_list is not an empty list, flags passed to compilation should be exactly the same as before without these changes. If the arch_list is an empty list, neither spir64 target or the -flink-huge-device-code will be set into flags.

These appear on pytorch initially built with TORCH_XPU_ARCH_LIST=pvc, i.e. on non-empty arch list. You have the pytest cmdline above to try it yourself: python -m pytest test/test_cpp_extensions_jit.py -k xpu. Please, make sure to find a root cause and remove the warning before this PR can be merged.

jingxu10 · 2025-03-24T20:08:22Z

@pytorchbot rebase

pytorchmergebot · 2025-03-24T20:09:57Z

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

pytorchmergebot · 2025-03-24T20:10:00Z

Successfully rebased jingxu10/cpp_extension_aot_main onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout jingxu10/cpp_extension_aot_main && git pull --rebase)

jingxu10 · 2025-04-03T01:26:41Z

@pytorchbot merge -f "lint is green "

pytorchmergebot · 2025-04-03T01:28:18Z

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2025-04-03T01:28:33Z

Merge failed

Reason: Approvers from one of the following sets are needed:

superuser (pytorch/metamates)
Core Reviewers (mruberry, lezcano, Skylion007, ngimel, peterbell10, ...)
Core Maintainers (soumith, gchanan, ezyang, dzhulgakov, malfet, ...)

Details for Dev Infra team

Raised by workflow job

Failing merge rule: Core Maintainers

EikanWang · 2025-04-08T01:57:30Z

@pytorchbot rebase -b main

pytorchmergebot · 2025-04-08T01:59:02Z

@pytorchbot started a rebase job onto refs/remotes/origin/main. Check the current status here

Co-authored-by: Yu, Guangye <106960996+guangyey@users.noreply.github.com>

Co-authored-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>

pytorchmergebot · 2025-04-08T01:59:05Z

Successfully rebased jingxu10/cpp_extension_aot_main onto refs/remotes/origin/main, please pull locally before adding more changes (for example, via git checkout jingxu10/cpp_extension_aot_main && git pull --rebase)

dvrogozh · 2025-04-24T13:45:26Z

@EikanWang, @guangyey : is xpu CI fixed now to let this PR be rebased and merged? I do have a follow up change ready to add a test for this change which is pending on a merge for a while.

malfet · 2025-04-24T22:53:22Z

torch/utils/cpp_extension.py

+# will be JIT compiled at runtime.
+_COMMON_SYCL_FLAGS = [
+    '-fsycl',
+    '-fsycl-targets=spir64_gen,spir64' if _get_sycl_arch_list() != '' else '',


It's pretty bad to do something like that for the global variable, as at least for CUDA and results in device initialization

I hope #152192 will address that.

malfet · 2025-04-24T22:53:55Z

@pytorchbot merge -f "Lint is green, XPU is red"

pytorchmergebot · 2025-04-24T22:55:35Z

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

If PyTorch is compiled with only AOT text strings starting with "dg2", the `_get_sycl_arch_list()` function will pass an empty string to `-device` argument of `ocloc` and then cause a compilation crash. Pull Request resolved: pytorch#149459 Approved by: https://github.com/guangyey, https://github.com/dvrogozh, https://github.com/malfet Co-authored-by: Yu, Guangye <106960996+guangyey@users.noreply.github.com> Co-authored-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>

dvrogozh · 2025-04-25T15:47:24Z

@jingxu10 what is the test plan? Can it be automated?(i.e. we have test_cpp_extension running for XPU?)

I open #152192 to add such a test.

If PyTorch is compiled with only AOT text strings starting with "dg2", the `_get_sycl_arch_list()` function will pass an empty string to `-device` argument of `ocloc` and then cause a compilation crash. Pull Request resolved: pytorch#149459 Approved by: https://github.com/guangyey, https://github.com/dvrogozh, https://github.com/malfet Co-authored-by: Yu, Guangye <106960996+guangyey@users.noreply.github.com> Co-authored-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>

jingxu10 requested review from fmassa, soumith and ezyang as code owners March 18, 2025 22:49

jingxu10 force-pushed the jingxu10/cpp_extension_aot_main branch from 8ba2d05 to fc809aa Compare March 18, 2025 22:52

jingxu10 requested review from EikanWang and guangyey March 18, 2025 22:52

pytorchbot added the open source label Mar 18, 2025

pytorch-bot bot added the topic: not user facing topic category label Mar 18, 2025

etaf added the ciflow/xpu Run XPU CI tasks label Mar 19, 2025

guangyey reviewed Mar 19, 2025

View reviewed changes

torch/utils/cpp_extension.py Show resolved Hide resolved

guangyey reviewed Mar 19, 2025

View reviewed changes

torch/utils/cpp_extension.py Show resolved Hide resolved

guangyey reviewed Mar 19, 2025

View reviewed changes

torch/utils/cpp_extension.py Outdated Show resolved Hide resolved

guangyey approved these changes Mar 19, 2025

View reviewed changes

guangyey added this to PyTorch Intel Mar 19, 2025

guangyey reviewed Mar 19, 2025

View reviewed changes

guangyey self-requested a review March 19, 2025 07:40

jingxu10 commented Mar 19, 2025

View reviewed changes

dvrogozh suggested changes Mar 19, 2025

View reviewed changes

dvrogozh reviewed Mar 19, 2025

View reviewed changes

jingxu10 force-pushed the jingxu10/cpp_extension_aot_main branch from 7475354 to 68c559c Compare March 24, 2025 00:55

guangyey approved these changes Mar 24, 2025

View reviewed changes

pytorchmergebot force-pushed the jingxu10/cpp_extension_aot_main branch from 68c559c to fedcd0f Compare March 24, 2025 20:10

pytorchmergebot force-pushed the jingxu10/cpp_extension_aot_main branch from feb8d5b to 876aa68 Compare April 1, 2025 01:44

pytorchmergebot added the merging label Apr 3, 2025

pytorchmergebot removed the merging label Apr 3, 2025

jingxu10 and others added 6 commits April 8, 2025 01:59

fix a bug in cpp extension to set aot device flags

d2f5fb3

update

406e539

Apply suggestions from code review

ba12fbb

Co-authored-by: Yu, Guangye <106960996+guangyey@users.noreply.github.com>

update

7157f24

correct flags configuration

f2c02e7

Update torch/utils/cpp_extension.py

78de53a

Co-authored-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>

pytorchmergebot force-pushed the jingxu10/cpp_extension_aot_main branch from 876aa68 to 78de53a Compare April 8, 2025 01:59

malfet approved these changes Apr 24, 2025

View reviewed changes

pytorchmergebot added the merging label Apr 24, 2025

pytorchmergebot closed this in 2089b22 Apr 24, 2025

pytorchmergebot added the Merged label Apr 24, 2025

github-project-automation bot moved this to Done in PyTorch Intel Apr 24, 2025

pytorchmergebot removed the merging label Apr 24, 2025

dvrogozh mentioned this pull request May 9, 2025

[RFC][API-Unstable] Support 3rd party SYCL kernels with CPP Extension API #153265

Open

11 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[xpu] set aot device flags in cpp_extension #149459

[xpu] set aot device flags in cpp_extension #149459

		return ['-fsycl-targets=spir64_gen,spir64',
		f'-Xs "-device {",".join(arch_list)}"']

		]
		_SYCL_DLINK_FLAGS += _get_sycl_arch_flag()

[xpu] set aot device flags in cpp_extension #149459

[xpu] set aot device flags in cpp_extension #149459

Conversation

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/149459

❌ 4 New Failures

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Merge started

Merge failed

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Merge started