8000 [xpu] set aot device flags in cpp_extension by jingxu10 · Pull Request #149459 · pytorch/pytorch · GitHub
[go: up one dir, main page]

Skip to content

[xpu] set aot device flags in cpp_extension #149459

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

jingxu10
Copy link
Collaborator
@jingxu10 jingxu10 commented Mar 18, 2025

If PyTorch is compiled with only AOT text strings starting with "dg2", the _get_sycl_arch_list() function will pass an empty string to -device argument of ocloc and then cause a compilation crash.

Copy link
pytorch-bot bot commented Mar 18, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/149459

Note: Links to docs will display an error until the docs builds have been completed.

❌ 4 New Failures

As of commit 78de53a with merge base d7f3cd0 (image):

NEW FAILURES - The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@jingxu10 jingxu10 force-pushed the jingxu10/cpp_extension_aot_main branch from 8ba2d05 to fc809aa Compare March 18, 2025 22:52
@jingxu10 jingxu10 requested review from EikanWang and guangyey March 18, 2025 22:52
@jingxu10
Copy link
Collaborator Author

@pytorchbot label "topic: not user facing"

@pytorch-bot pytorch-bot bot added the topic: not user facing topic category label Mar 18, 2025
@etaf etaf added the ciflow/xpu Run XPU CI tasks label Mar 19, 2025
Copy link
Collaborator
@guangyey guangyey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Others LGTM.

@@ -284,25 +284,26 @@ def _join_sycl_home(*paths) -> str:

_COMMON_SYCL_FLAGS = [
'-fsycl',
'-fsycl-targets=spir64_gen,spir64',
Copy link
Collaborator
@guangyey guangyey Mar 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-fsycl-targets=spir64_gen,spir64 is a kernel flag as well I remember.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refer to https://github.com/intel/llvm/blob/sycl/sycl/doc/UsersManual.md, it is OK to remove this from common flags.

Normally, '-fsycl-targets' is specified when linking an application, in
which case the AOT compiled device binaries are embedded within the
application’s fat executable.  However, this option may also be used in
combination with '-c' and '-fno-sycl-rdc' when compiling a source file.
In this case, the AOT compiled device binaries are embedded within the fat
object file.

Comment on lines 298 to 299
return ['-fsycl-targets=spir64_gen,spir64',
f'-Xs "-device {",".join(arch_list)}"']
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return ['-fsycl-targets=spir64_gen,spir64',
f'-Xs "-device {",".join(arch_list)}"']
return ['-fsycl-targets=spir64_gen,spir64',
'-flink-huge-device-code',
f'-Xs "-device {",".join(arch_list)}"']

Copy link
Collaborator
@guangyey guangyey Mar 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'-flink-huge-device-code' is only a link flag, right?

def _get_sycl_arch_list():
if 'TORCH_XPU_ARCH_LIST' in os.environ:
return os.environ.get('TORCH_XPU_ARCH_LIST')
def _get_sycl_arch_flag():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def _get_sycl_arch_flag():
def _get_sycl_arch_flags():

sorry.

@guangyey guangyey self-requested a review March 19, 2025 07:40
]
_SYCL_DLINK_FLAGS += _get_sycl_arch_flag()
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@guangyey would you suggest a commit to change this flag to flags as well?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes.

arch_list = torch.xpu.get_arch_list()
# Dropping dg2* archs since they lack hardware support for fp64 and require
# special consideration from the user. If needed these platforms can
# be requested thru TORCH_XPU_ARCH_LIST environment variable.
arch_list = [x for x in arch_list if not x.startswith('dg2')]
return ','.join(arch_list)
if len(arch_list) == 0:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check here is the only relevant change to the declared purpose of the PR. All other changes such as adjustment of _COMMON_SYCL_FLAGS, _SYCL_DLINK_FLAGS and adding more flags to _get_sycl_arch_flags is out of the declared scope of the PR per its title and its description. @jingxu10 : can you, please, adjust them? In particular, I am looking for the description change which would clarify why are you making all these other changes?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When setting an empty string as the aot target we don't want to perform aot compilation with ocloc. In this case, neither -fsycl-targets=spir64_gen,spir64 nor -device should be set. Otherwise, ocloc will crash during compilation. This is the reason why we move the setting of -fsycl-targets=spir64_gen,spir64 and -device into the condition when we would like to run AOT compilation with targets.

arch_list = torch.xpu.get_arch_list()
# Dropping dg2* archs since they lack hardware support for fp64 and require
# special consideration from the user. If needed these platforms can
# be requested thru TORCH_XPU_ARCH_LIST environment variable.
arch_list = [x for x in arch_list if not x.startswith('dg2')]
return ','.join(arch_list)
if len(arch_list) == 0:
return []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What the behavior of built extension will be if we return empty [] arch flags list?

  1. Will it actually be buildable?
  2. For which arch(s) it will be built?
  3. Or it won't be pre-built for any AOT target and running extension will result in runtime compilation?

Secondly, is '-fsycl-targets=spir64_gen,spir64' still needed to be passed here?

I think empty arch list worths a comment left in the source code here.

Copy link
Collaborator Author
@jingxu10 jingxu10 Mar 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Compilation will still work, but without aot compilation with ocloc. As you mentioned in3, it won't be pre-built for any AOT target and running extension will result in runtime compilation.
-fsycl-targets=spir64_gen,spir64 cannot be here, otherwise ocloc will crash complaining no targets are set.

if len(arch_list) == 0:
return []
else:
return ['-fsycl-targets=spir64_gen,spir64',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving these flags here does not seem to actually work correctly. With this change, the following 2 warnings appear. I suggest you better drop these flags from the PR and do that separately. if needed

# python -m pytest test/test_cpp_extensions_jit.py -k xpu
...
test/test_cpp_extensions_jit.py [1/4] c++ -MMD -MF main.o.d -DTORCH_EXTENSION_NAME=inline_jit_extension_xpu -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1018\" -isystem /home/dvrogozh/git/pytorch/pytorch/torch/include -isystem /home/dvrogozh/git/pytorch/pytorch/torch/include/torch/csrc/api/include -isystem /usr/include/python3.12 -D_GLIBCXX_USE_CXX11_ABI=1 -fPIC -std=c++17 -c /home/dvrogozh/.cache/torch_extensions/py312_cpu/inline_jit_extension_xpu/main.cpp -o main.o
[2/4] icpx -DTORCH_EXTENSION_NAME=inline_jit_extension_xpu -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1018\" -isystem /home/dvrogozh/git/pytorch/pytorch/torch/include -isystem /home/dvrogozh/git/pytorch/pytorch/torch/include/torch/csrc/api/include -isystem /usr/include/python3.12 -D_GLIBCXX_USE_CXX11_ABI=1 -fPIC -std=c++17 -fsycl -sycl-std=2020 -fsycl-host-compiler=c++ '-fsycl-host-compiler-options=-DTORCH_EXTENSION_NAME=inline_jit_extension_xpu -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\\"_gcc\\" -DPYBIND11_STDLIB=\\"_libstdcpp\\" -DPYBIND11_BUILD_ABI=\\"_cxxabi1018\\" -isystem /home/dvrogozh/git/pytorch/pytorch/torch/include -isystem /home/dvrogozh/git/pytorch/pytorch/torch/include/torch/csrc/api/include -isystem /usr/include/python3.12 -D_GLIBCXX_USE_CXX11_ABI=1 -fPIC -std=c++17' -c -x c++ /home/dvrogozh/.cache/torch_extensions/py312_cpu/inline_jit_extension_xpu/sycl.sycl -o sycl.sycl.o
[3/4] icpx main.o sycl.sycl.o -o sycl_dlink.o -fsycl -fsycl-link --offload-compress -fsycl-targets=spir64_gen,spir64 -flink-huge-device-code -Xs "-device pvc"
icpx: warning: linked binaries do not contain expected 'spir64_gen-unknown-unknown' target; found targets: 'spir64-unknown-unknown' [-Wsycl-target]
icpx: warning: argument unused during compilation: '-flink-huge-device-code' [-Wunused-command-line-argument]

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do these 2 warning messages appear with empty aot or non-empty aot?

Copy link
Collaborator Author
@jingxu10 jingxu10 Mar 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These 2 warning messages don't seem to make sense to me. If the arch_list is not an empty list, flags passed to compilation should be exactly the same as before without these changes. If the arch_list is an empty list, neither spir64 target or the -flink-huge-device-code will be set into flags.

Copy link
Contributor
@dvrogozh dvrogozh Mar 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These appear on pytorch initially built with TORCH_XPU_ARCH_LIST=pvc, i.e. on non-empty arch list. You have the pytest cmdline above to try it yourself: python -m pytest test/test_cpp_extensions_jit.py -k xpu. Please, make sure to find a root cause and remove the warning before this PR can be merged.

@jingxu10 jingxu10 force-pushed the jingxu10/cpp_extension_aot_main branch from 7475354 to 68c559c Compare March 24, 2025 00:55
@jingxu10
Copy link
Collaborator Author

@pytorchbot rebase

@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Successfully rebased jingxu10/cpp_extension_aot_main onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout jingxu10/cpp_extension_aot_main && git pull --rebase)

@pytorchmergebot pytorchmergebot force-pushed the jingxu10/cpp_extension_aot_main branch from 68c559c to fedcd0f Compare March 24, 2025 20:10
@pytorchmergebot pytorchmergebot force-pushed the jingxu10/cpp_extension_aot_main branch from feb8d5b to 876aa68 Compare April 1, 2025 01:44
@jingxu10
Copy link
Collaborator Author
jingxu10 commented Apr 3, 2025

@pytorchbot merge -f "lint is green "

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: Approvers from one of the following sets are needed:

  • superuser (pytorch/metamates)
  • Core Reviewers (mruberry, lezcano, Skylion007, ngimel, peterbell10, ...)
  • Core Maintainers (soumith, gchanan, ezyang, dzhulgakov, malfet, ...)
Details for Dev Infra team Raised by workflow job

Failing merge rule: Core Maintainers

@EikanWang
Copy link
Collaborator

@pytorchbot rebase -b main

@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/main. Check the current status here

jingxu10 and others added 6 commits April 8, 2025 01:59
Co-authored-by: Yu, Guangye <106960996+guangyey@users.noreply.github.com>
Co-authored-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
@pytorchmergebot
Copy link
Collaborator

Successfully rebased jingxu10/cpp_extension_aot_main onto refs/remotes/origin/main, please pull locally before adding more changes (for example, via git checkout jingxu10/cpp_extension_aot_main && git pull --rebase)

@pytorchmergebot pytorchmergebot force-pushed the jingxu10/cpp_extension_aot_main branch from 876aa68 to 78de53a Compare April 8, 2025 01:59
@dvrogozh
Copy link
Contributor

@EikanWang, @guangyey : is xpu CI fixed now to let this PR be rebased and merged? I do have a follow up change ready to add a test for this change which is pending on a merge for a while.

# will be JIT compiled at runtime.
_COMMON_SYCL_FLAGS = [
'-fsycl',
'-fsycl-targets=spir64_gen,spir64' if _get_sycl_arch_list() != '' else '',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's pretty bad to do something like that for the global variable, as at least for CUDA and results in device initialization

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope #152192 will address that.

@malfet
Copy link
Contributor
malfet commented Apr 24, 2025

@pytorchbot merge -f "Lint is green, XPU is red"

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

wangkuiyi pushed a commit to wangkuiyi/pytorch that referenced this pull request Apr 25, 2025
If PyTorch is compiled with only AOT text strings starting with "dg2", the `_get_sycl_arch_list()` function will pass an empty string to `-device` argument of `ocloc` and then cause a compilation crash.
Pull Request resolved: pytorch#149459
Approved by: https://github.com/guangyey, https://github.com/dvrogozh, https://github.com/malfet

Co-authored-by: Yu, Guangye <106960996+guangyey@users.noreply.github.com>
Co-authored-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
@dvrogozh
Copy link
1241
Contributor

@jingxu10 what is the test plan? Can it be automated?(i.e. we have test_cpp_extension running for XPU?)

I open #152192 to add such a test.

rec pushed a commit to rec/pytorch that referenced this pull request Apr 25, 2025
If PyTorch is compiled with only AOT text strings starting with "dg2", the `_get_sycl_arch_list()` function will pass an empty string to `-device` argument of `ocloc` and then cause a compilation crash.
Pull Request resolved: pytorch#149459
Approved by: https://github.com/guangyey, https://github.com/dvrogozh, https://github.com/malfet

Co-authored-by: Yu, Guangye <106960996+guangyey@users.noreply.github.com>
Co-authored-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

8 participants
0