-
Notifications
You must be signed in to change notification settings - Fork 24.2k
[xpu] set aot device flags in cpp_extension #149459
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/149459
Note: Links to docs will display an error until the docs builds have been completed. ❌ 4 New FailuresAs of commit 78de53a with merge base d7f3cd0 ( NEW FAILURES - The following jobs have failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
8ba2d05
to
fc809aa
Compare
@pytorchbot label "topic: not user facing" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Others LGTM.
@@ -284,25 +284,26 @@ def _join_sycl_home(*paths) -> str: | |||
|
|||
_COMMON_SYCL_FLAGS = [ | |||
'-fsycl', | |||
'-fsycl-targets=spir64_gen,spir64', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-fsycl-targets=spir64_gen,spir64
is a kernel flag as well I remember.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refer to https://github.com/intel/llvm/blob/sycl/sycl/doc/UsersManual.md, it is OK to remove this from common flags.
Normally, '-fsycl-targets' is specified when linking an application, in
which case the AOT compiled device binaries are embedded within the
application’s fat executable. However, this option may also be used in
combination with '-c' and '-fno-sycl-rdc' when compiling a source file.
In this case, the AOT compiled device binaries are embedded within the fat
object file.
torch/utils/cpp_extension.py
Outdated
return ['-fsycl-targets=spir64_gen,spir64', | ||
f'-Xs "-device {",".join(arch_list)}"'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return ['-fsycl-targets=spir64_gen,spir64', | |
f'-Xs "-device {",".join(arch_list)}"'] | |
return ['-fsycl-targets=spir64_gen,spir64', | |
'-flink-huge-device-code', | |
f'-Xs "-device {",".join(arch_list)}"'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'-flink-huge-device-code' is only a link flag, right?
torch/utils/cpp_extension.py
Outdated
def _get_sycl_arch_list(): | ||
if 'TORCH_XPU_ARCH_LIST' in os.environ: | ||
return os.environ.get('TORCH_XPU_ARCH_LIST') | ||
def _get_sycl_arch_flag(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def _get_sycl_arch_flag(): | |
def _get_sycl_arch_flags(): |
sorry.
torch/utils/cpp_extension.py
Outdated
] | ||
_SYCL_DLINK_FLAGS += _get_sycl_arch_flag() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@guangyey would you suggest a commit to change this flag
to flags
as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes.
torch/utils/cpp_extension.py
Outdated
arch_list = torch.xpu.get_arch_list() | ||
# Dropping dg2* archs since they lack hardware support for fp64 and require | ||
# special consideration from the user. If needed these platforms can | ||
# be requested thru TORCH_XPU_ARCH_LIST environment variable. | ||
arch_list = [x for x in arch_list if not x.startswith('dg2')] | ||
return ','.join(arch_list) | ||
if len(arch_list) == 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This check here is the only relevant change to the declared purpose of the PR. All other changes such as adjustment of _COMMON_SYCL_FLAGS
, _SYCL_DLINK_FLAGS
and adding more flags to _get_sycl_arch_flags
is out of the declared scope of the PR per its title and its description. @jingxu10 : can you, please, adjust them? In particular, I am looking for the description change which would clarify why are you making all these other changes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When setting an empty string as the aot target we don't want to perform aot compilation with ocloc
. In this case, neither -fsycl-targets=spir64_gen,spir64
nor -device
should be set. Otherwise, ocloc
will crash during compilation. This is the reason why we move the setting of -fsycl-targets=spir64_gen,spir64
and -device
into the condition when we would like to run AOT compilation with targets.
torch/utils/cpp_extension.py
Outdated
arch_list = torch.xpu.get_arch_list() | ||
# Dropping dg2* archs since they lack hardware support for fp64 and require | ||
# special consideration from the user. If needed these platforms can | ||
# be requested thru TORCH_XPU_ARCH_LIST environment variable. | ||
arch_list = [x for x in arch_list if not x.startswith('dg2')] | ||
return ','.join(arch_list) | ||
if len(arch_list) == 0: | ||
return [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What the behavior of built extension will be if we return empty []
arch flags list?
- Will it actually be buildable?
- For which arch(s) it will be built?
- Or it won't be pre-built for any AOT target and running extension will result in runtime compilation?
Secondly, is '-fsycl-targets=spir64_gen,spir64'
still needed to be passed here?
I think empty arch list worths a comment left in the source code here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Compilation will still work, but without aot compilation with ocloc
. As you mentioned in3, it won't be pre-built for any AOT target and running extension will result in runtime compilation.
-fsycl-targets=spir64_gen,spir64
cannot be here, otherwise ocloc
will crash complaining no targets are set.
torch/utils/cpp_extension.py
Outdated
if len(arch_list) == 0: | ||
return [] | ||
else: | ||
return ['-fsycl-targets=spir64_gen,spir64', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moving these flags here does not seem to actually work correctly. With this change, the following 2 warnings appear. I suggest you better drop these flags from the PR and do that separately. if needed
# python -m pytest test/test_cpp_extensions_jit.py -k xpu
...
test/test_cpp_extensions_jit.py [1/4] c++ -MMD -MF main.o.d -DTORCH_EXTENSION_NAME=inline_jit_extension_xpu -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1018\" -isystem /home/dvrogozh/git/pytorch/pytorch/torch/include -isystem /home/dvrogozh/git/pytorch/pytorch/torch/include/torch/csrc/api/include -isystem /usr/include/python3.12 -D_GLIBCXX_USE_CXX11_ABI=1 -fPIC -std=c++17 -c /home/dvrogozh/.cache/torch_extensions/py312_cpu/inline_jit_extension_xpu/main.cpp -o main.o
[2/4] icpx -DTORCH_EXTENSION_NAME=inline_jit_extension_xpu -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1018\" -isystem /home/dvrogozh/git/pytorch/pytorch/torch/include -isystem /home/dvrogozh/git/pytorch/pytorch/torch/include/torch/csrc/api/include -isystem /usr/include/python3.12 -D_GLIBCXX_USE_CXX11_ABI=1 -fPIC -std=c++17 -fsycl -sycl-std=2020 -fsycl-host-compiler=c++ '-fsycl-host-compiler-options=-DTORCH_EXTENSION_NAME=inline_jit_extension_xpu -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\\"_gcc\\" -DPYBIND11_STDLIB=\\"_libstdcpp\\" -DPYBIND11_BUILD_ABI=\\"_cxxabi1018\\" -isystem /home/dvrogozh/git/pytorch/pytorch/torch/include -isystem /home/dvrogozh/git/pytorch/pytorch/torch/include/torch/csrc/api/include -isystem /usr/include/python3.12 -D_GLIBCXX_USE_CXX11_ABI=1 -fPIC -std=c++17' -c -x c++ /home/dvrogozh/.cache/torch_extensions/py312_cpu/inline_jit_extension_xpu/sycl.sycl -o sycl.sycl.o
[3/4] icpx main.o sycl.sycl.o -o sycl_dlink.o -fsycl -fsycl-link --offload-compress -fsycl-targets=spir64_gen,spir64 -flink-huge-device-code -Xs "-device pvc"
icpx: warning: linked binaries do not contain expected 'spir64_gen-unknown-unknown' target; found targets: 'spir64-unknown-unknown' [-Wsycl-target]
icpx: warning: argument unused during compilation: '-flink-huge-device-code' [-Wunused-command-line-argument]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do these 2 warning messages appear with empty aot or non-empty aot?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These 2 warning messages don't seem to make sense to me. If the arch_list
is not an empty list, flags passed to compilation should be exactly the same as before without these changes. If the arch_list
is an empty list, neither spir64 target or the -flink-huge-device-code
will be set into flags.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These appear on pytorch initially built with TORCH_XPU_ARCH_LIST=pvc
, i.e. on non-empty arch list. You have the pytest cmdline above to try it yourself: python -m pytest test/test_cpp_extensions_jit.py -k xpu
. Please, make sure to find a root cause and remove the warning before this PR can be merged.
7475354
to
68c559c
Compare
@pytorchbot rebase |
@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here |
Successfully rebased |
68c559c
to
fedcd0f
Compare
feb8d5b
to
876aa68
Compare
@pytorchbot merge -f "lint is green " |
Merge startedYour change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Merge failedReason: Approvers from one of the following sets are needed:
|
@pytorchbot rebase -b main |
@pytorchbot started a rebase job onto refs/remotes/origin/main. Check the current status here |
Co-authored-by: Yu, Guangye <106960996+guangyey@users.noreply.github.com>
Co-authored-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
Successfully rebased |
876aa68
to
78de53a
Compare
@EikanWang, @guangyey : is xpu CI fixed now to let this PR be rebased and merged? I do have a follow up change ready to add a test for this change which is pending on a merge for a while. |
# will be JIT compiled at runtime. | ||
_COMMON_SYCL_FLAGS = [ | ||
'-fsycl', | ||
'-fsycl-targets=spir64_gen,spir64' if _get_sycl_arch_list() != '' else '', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's pretty bad to do something like that for the global variable, as at least for CUDA and results in device initialization
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hope #152192 will address that.
@pytorchbot merge -f "Lint is green, XPU is red" |
Merge startedYour change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
If PyTorch is compiled with only AOT text strings starting with "dg2", the `_get_sycl_arch_list()` function will pass an empty string to `-device` argument of `ocloc` and then cause a compilation crash. Pull Request resolved: pytorch#149459 Approved by: https://github.com/guangyey, https://github.com/dvrogozh, https://github.com/malfet Co-authored-by: Yu, Guangye <106960996+guangyey@users.noreply.github.com> Co-authored-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
If PyTorch is compiled with only AOT text strings starting with "dg2", the `_get_sycl_arch_list()` function will pass an empty string to `-device` argument of `ocloc` and then cause a compilation crash. Pull Request resolved: pytorch#149459 Approved by: https://github.com/guangyey, https://github.com/dvrogozh, https://github.com/malfet Co-authored-by: Yu, Guangye <106960996+guangyey@users.noreply.github.com> Co-authored-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
If PyTorch is compiled with only AOT text strings starting with "dg2", the
_get_sycl_arch_list()
function will pass an empty string to-device
argument ofocloc
and then cause a compilation crash.