10000
We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
There was an error while loading. Please reload this page.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix issue #149829
In torch.export initialization stage, the context variable of torch.backends.mkldn would be initialized at function _ignore_backend_decomps in torch/export/_trace.py.
torch.export
torch.backends.mkldn
_ignore_backend_decomps
torch/export/_trace.py
It should be wrong to trigger no-gpu warning when trying to setting the value to False in a CPU-Only environment. The right behavior is raising warning only when user try to turn it on if no GPU.
False
Stack from ghstack (oldest at bottom):
Sorry, something went wrong.
Update
c844b23
[ghstack-poisoned]
Note: Links to docs will display an error until the docs builds have been completed.
As of commit f15d95b with merge base 86dcdf9 ():
/usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/sstream:152:52: error: expected value in expression
👉 Rebase onto the `viable/strict` branch to avoid these failures
Process completed with exit code 1.
This comment was automatically generated by Dr. CI and updates every 15 minutes.
[Intel GPU] trigger tf32 no-gpu warn only when setting true
ce00285
ghstack-source-id: 2e37c38 Pull Request resolved: #149926
Thanks! Would it be possible to cherry pick this into 2.7?
@justinchuby Sure, I would cherry-pick it when this PR finishes review. Thank you again for pointing out this issue and help us enhance the quality.
Apply suggestions from code review
da91b07
@pytorchbot rebase
@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here
eaa80dc
Successfully rebased gh/ZhiweiYan-96/57/orig onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via ghstack checkout https://github.com/pytorch/pytorch/pull/149926)
gh/ZhiweiYan-96/57/orig
refs/remotes/origin/viable/strict
ghstack checkout https://github.com/pytorch/pytorch/pull/149926
4c326b1
f762102
@pytorchbot rebase -b main
@pytorchbot started a rebase job onto refs/remotes/origin/main. Check the current status here
6ed8781
Successfully rebased gh/ZhiweiYan-96/57/orig onto refs/remotes/origin/main, please pull locally before adding more changes (for example, via ghstack checkout https://github.com/pytorch/pytorch/pull/149926)
refs/remotes/origin/main
27218f5
ghstack-source-id: 4b9daa6 Pull Request resolved: #149926
There was a problem hiding this comment.
The reason will be displayed to describe this comment to others. Learn more.
Please fix the lint.
Why is the lint issue not captured?
I wrongly thought you use some git button to fix the lint issue since you resolve the comment... hah, I will fix them, thanks for reminding.
@EikanWang this file is not in .lintrunner.toml . (You may see other if(b) like case in this file) . Maybe this is a choice of other maintainers...
.lintrunner.toml
if(b)
fixed in the new commit
No. Refer to
pytorch/.lintrunner.toml
Lines 55 to 57 in a8d0c5c
Please fix the lint. I wrongly thought you use some git button to fix the lint issue since you resolve the comment... hah, I will fix them, thanks for reminding.
Haha, just a reminder. I will break the ghstack merge label if I use the git button. So I revert my patch.
f15d95b
e25f057
ghstack-source-id: 7f99dea Pull Request resolved: #149926
@malfet May I know if this hotfix is reasonable for you?
@malfet May I know if this hotfix is reasonable for you? It will suppress the unexpected warning message.
Hi, @malfet could you please help take a look on this PR? Thanks!
@malfet @atalman This PR aims to suppress an unexpected warning message and is targeted for the release branch. May I know if you have any comments.
hi, @malfet @atalman Could you take a look at this PR when you have time? Appreciation for your suggestions. This PR introduces minor and straightforward change to trigger the warning logic on tf32 setting. It will fix #149829 and it is nice to be land in 2.7 release version. Thanks.
@guangyey do you know why this code is used in cpu-only code?
hi @malfet , it is because torch.export initializes flags of backends. The backends intialization code is shared by cpu and xpu. Following is a simple backtrace from pdb.
-> return _export( /4T-720/conda_envs/zhiwei-int4/lib/python3.10/site-packages/torch/export/_trace.py(1072)wrapper() -> ep = fn(*args, **kwargs) /4T-720/conda_envs/zhiwei-int4/lib/python3.10/site-packages/torch/export/exported_program.py(122)wrapper() -> return fn(*args, **kwargs) /4T-720/conda_envs/zhiwei-int4/lib/python3.10/site-packages/torch/export/_trace.py(2111)_export() -> ep = _export_for_training( /4T-720/conda_envs/zhiwei-int4/lib/python3.10/site-packages/torch/export/_trace.py(1072)wrapper() -> ep = fn(*args, **kwargs) /4T-720/conda_envs/zhiwei-int4/lib/python3.10/site-packages/torch/export/exported_program.py(122)wrapper() -> return fn(*args, **kwargs) /4T-720/conda_envs/zhiwei-int4/lib/python3.10/site-packages/torch/export/_trace.py(1973)_export_for_training() -> export_artifact = export_func( /4T-720/conda_envs/zhiwei-int4/lib/python3.10/site-packages/torch/export/_trace.py(1916)_non_strict_export() -> aten_export_artifact = _to_aten_func( # type: ignore[operator] /4T-720/conda_envs/zhiwei-int4/lib/python3.10/site-packages/torch/export/_trace.py(1696)_export_to_aten_ir_make_fx() -> with torch.nn.utils.stateless._reparametrize_module( /4T-720/conda_envs/zhiwei-int4/lib/python3.10/contextlib.py(142)__exit__() -> next(self.gen) /4T-720/conda_envs/zhiwei-int4/lib/python3.10/site-packages/torch/export/_trace.py(173)_ignore_backend_decomps() -> torch.backends.mkldnn.set_flags(*orig_mkldnn_flag)
@ZhiweiYan-96 imo better solution would have been for torch._C._get_onednn_allow_tf32() to return None if compiled without XPU
torch._C._get_onednn_allow_tf32()
% git diff diff --git a/torch/csrc/Module.cpp b/torch/csrc/Module.cpp index 9d39a5872b6..bdb720377d4 100644 --- a/torch/csrc/Module.cpp +++ b/torch/csrc/Module.cpp @@ -965,10 +965,14 @@ static PyObject* THPModule_setAllowTF32OneDNN( static PyObject* THPModule_allowTF32OneDNN( PyObject* _unused, PyObject* noargs) { +#ifndef USE_XPU if (at::globalContext().allowTF32OneDNN()) Py_RETURN_TRUE; else Py_RETURN_FALSE; +#else + Py_RETURN_NONE; +#endif }
Also, if you want to cherry-pick into 2.7, please add regression test
@malfet Thanks for you suggestions, and it seems more reasonable than current change. I will check your advice and push commit, thanks.
@ZhiweiYan-96 any updates?
hi @malfet I saw you push a #150358 for closing the issue. I miss the tracking here due to some urgent affairs. Sincere appreciation for your help.
The issue has been fixed in #150358
EikanWang EikanWang approved these changes
guangyey guangyey approved these changes
malfet Awaiting requested review from malfet
atalman Awaiting requested review from atalman
Successfully merging this pull request may close these issues.