[AOTI] Fix #140546 and support AOTI package load for Intel GPU. #140664

etaf · 2024-11-14T03:39:16Z

Stack from ghstack (oldest at bottom):

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang @aakhundov

[ghstack-poisoned]

pytorch-bot · 2024-11-14T03:39:20Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/140664

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

MacOS tests has not been running for few weeks

❌ 1 New Failure

As of commit 2abadda with merge base f84e533 ():

NEW FAILURE - The following job has failed:

xpu / linux-jammy-xpu-2025.0-py3.9 / test (default, 4, 4, linux.idc.xpu) (gh)
inductor/test_torchinductor_opinfo.py::TestInductorOpInfoXPU::test_comprehensive_masked_cumprod_xpu_float16

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ghstack-source-id: 003c826 Pull Request resolved: #140664

[ghstack-poisoned]

EikanWang · 2024-11-14T11:36:10Z

torch/csrc/inductor/aoti_runner/model_container_runner_xpu.cpp

+    const std::string& model_so_path,
+    size_t num_models,
+    const std::string& device_str,
+    const std::string& cubin_dir) {


Suggested change

const std::string& cubin_dir) {

const std::string& kernel_bin_dir) {

EikanWang · 2024-11-14T11:48:03Z

torch/csrc/inductor/aoti_runner/model_container_runner.h


-  std::vector<at::Tensor> run(
+  virtual std::vector<at::Tensor> run(const std::vector<at::Tensor>& inputs);


It changes the API definition. According to the issue description, the null stream means the current stream for CUDA. The same semantic should also be applicable for Intel GPU. Since this PR has fixed the virtual function routine, the runner instance could dispatch run function to the right device container runner.

This relates to our definition of the run API. At first I thought it would be more generic to define it as std::vector<at::Tensor> run(const std::vector<at::Tensor>& inputs, AOTInductorStreamHandle stream_handle = nullptr);
. But this means that in model_container_runner_xpu.h, model_container_runner_xpu.h, model_container_runner_cpu.h, we need to define this override API as well. Since AOTIModelContainerRunnerCpu/Xpu/Cuda will be used in pybind, we'll get compilation error:

torch/csrc/inductor/aoti_runner/pybind.cpp:21:11: required from here /usr/include/c++/11/type_traits:1422:38: error: invalid use of incomplete type ‘struct AOTInductorStreamOpaque’ 1422 | : public integral_constant<bool, __is_base_of(_Base, _Derived)> | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In file included from /home/xinanlin/xinanlin/pytorch/torch/csrc/inductor/aoti_runner/model_container_runner.h:5, from torch/csrc/inductor/aoti_runner/model_container_runner_cpu.h:4, from torch/csrc/inductor/aoti_runner/pybind.cpp:1: torch/csrc/inductor/aoti_runtime/interface.h:16:8: note: forward declaration of ‘struct AOTInductorStreamOpaque’

The current design of this PR defines the interface of cpu/xpu/cuda container runner as std::vector<at::Tensor> run(const std::vector<at::Tensor>& inputs); is just keep the same before PR, so I think there is no issue to land it.

And if we prefer the interface run as std::vector<at::Tensor> run(const std::vector<at::Tensor>& inputs, AOTInductorStreamHandle stream_handle = nullptr); , we can create an adapter between pybind interface and the AOTI runner container for the container to resolve the above compilation issue in a new PR.

Agree with @EikanWang , this is an API breaking change. I think if you declare stream_handle as void*, it should solve your compilation issue.

Hi, @desertfire I've updated the PR, could you please have a review? BTW could you please also review this series of PRs? They are all for AOTi XPU and are ready for review.

[ghstack-poisoned]

desertfire · 2024-11-18T15:08:37Z

torch/csrc/inductor/aoti_runner/model_container_runner.h


-  std::vector<at::Tensor> run(
+  virtual std::vector<at::Tensor> run(const std::vector<at::Tensor>& inputs);


Agree with @EikanWang , this is an API breaking change. I think if you declare stream_handle as void*, it should solve your compilation issue.

[ghstack-poisoned]

…U. (#140664)" This reverts commit 91d3054. Reverted #140664 on behalf of https://github.com/clee2000 due to breaks forward compatibility? D66937097 ([comment](#140269 (comment)))

… GPU." Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * #140686 * __->__ #140664 * #140269 * #140268 * #135320 * #135318 * #139026 Fix #140546 cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang aakhundov [ghstack-poisoned]

…e load for Intel GPU." Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * #140686 * __->__ #140664 * #140269 * #140268 * #135320 * #135318 * #139026 Fix #140546 cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang aakhundov [ghstack-poisoned]

ghstack-source-id: 83c586e Pull Request resolved: #140664

desertfire · 2024-12-09T19:05:56Z

Relanding, see #140269 (comment)

desertfire · 2024-12-09T19:06:05Z

@pytorchbot merge

pytorchmergebot · 2024-12-09T19:07:51Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2024-12-09T20:01:03Z

Merge failed

Reason: 2 mandatory check(s) failed. The first few are:

Dig deeper by viewing the failures on hud

Details for Dev Infra team

Raised by workflow job

Failing merge rule: Core Maintainers

… GPU." Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * #140686 * __->__ #140664 * #140269 * #140268 * #135320 * #135318 * #139026 Fix #140546 cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang aakhundov [ghstack-poisoned]

…e load for Intel GPU." Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * #140686 * __->__ #140664 * #140269 * #140268 * #135320 * #135318 * #139026 Fix #140546 cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang aakhundov [ghstack-poisoned]

ghstack-source-id: 59a5add Pull Request resolved: #140664

etaf · 2024-12-10T02:33:26Z

The failed job: xpu / linux-jammy-xpu-2025.0-py3.9 / test (default, 4, 4, linux.idc.xpu) (gh)
inductor/test_torchinductor_opinfo.py::TestInductorOpInfoXPU::test_comprehensive_masked_cumprod_xpu_float16

is a known issue #141861 and has been fixed in main branch by #142348.

Please ignore it.

@desertfire can we land this PR after CI finished?

desertfire · 2024-12-10T04:01:21Z

The failed job: xpu / linux-jammy-xpu-2025.0-py3.9 / test (default, 4, 4, linux.idc.xpu) (gh) inductor/test_torchinductor_opinfo.py::TestInductorOpInfoXPU::test_comprehensive_masked_cumprod_xpu_float16

is a known issue #141861 and has been fixed in main branch by #142348.

Please ignore it.

@desertfire can we land this PR after CI finished?

SGTM

etaf · 2024-12-10T04:57:24Z

@pytorchbot merge -i

pytorchmergebot · 2024-12-10T04:59:14Z

Merge started

Your change will be merged while ignoring the following 1 checks: xpu / linux-jammy-xpu-2025.0-py3.9 / test (default, 4, 4, linux.idc.xpu)

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorch#140664) Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * pytorch#140686 * __->__ pytorch#140664 * pytorch#140269 * pytorch#140268 * pytorch#135320 * pytorch#135318 * pytorch#139026 Fix pytorch#140546 Pull Request resolved: pytorch#140664 Approved by: https://github.com/desertfire, https://github.com/EikanWang ghstack dependencies: pytorch#140268, pytorch#140269 Co-authored-by: Bin Bao <binbao@meta.com>

…ntel GPU. (pytorch#140664)" This reverts commit 91d3054. Reverted pytorch#140664 on behalf of https://github.com/clee2000 due to breaks forward compatibility? D66937097 ([comment](pytorch#140269 (comment)))

pytorch#140664) Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * pytorch#140686 * __->__ pytorch#140664 * pytorch#140269 * pytorch#140268 * pytorch#135320 * pytorch#135318 * pytorch#139026 Fix pytorch#140546 Pull Request resolved: pytorch#140664 Approved by: https://github.com/desertfire, https://github.com/EikanWang ghstack dependencies: pytorch#140268, pytorch#140269 Co-authored-by: Bin Bao <binbao@meta.com>

ghstack-source-id: c702bf7 Pull Request resolved: pytorch/pytorch#140664

Update

652dd6e

[ghstack-poisoned]

etaf mentioned this pull request Nov 12, 2024

[Intel GPU] Support getStreamFromExternel for XPU. #140268

Closed

pytorch-bot bot added ciflow/inductor module: inductor labels Nov 14, 2024

etaf mentioned this pull request Nov 14, 2024

[AOTI XPU] Support AOT Inductor for Intel GPU. #140269

Closed

etaf added a commit that referenced this pull request Nov 14, 2024

[AOTI] Fix #140546 and support AOTI package load for Intel GPU.

6154777

ghstack-source-id: 003c826 Pull Request resolved: #140664

etaf marked this pull request as draft November 14, 2024 03:40

etaf mentioned this pull request Nov 14, 2024

[AOTI] AOTIModelPackageLoader::run dispatch to specific device containner runner not really work. #140546

Closed

pytorchbot added the open source label Nov 14, 2024

Update

2bd9ac4

[ghstack-poisoned]

etaf mentioned this pull request Nov 14, 2024

Reuse test cases in test/export for XPU. #140686

Closed

etaf marked this pull request as ready for review November 14, 2024 08:38

etaf requested a review from desertfire November 14, 2024 08:39

etaf added the ciflow/xpu Run XPU CI tasks label Nov 14, 2024

EikanWang reviewed Nov 14, 2024

View reviewed changes

etaf added the topic: not user facing topic category label Nov 14, 2024

Update

b33075d

[ghstack-poisoned]

etaf mentioned this pull request Nov 15, 2024

[AOTI XPU] Add XPUStreamGuard for XPU which is necessary for AOTI. #140819

Closed

Update

6f31e7e

[ghstack-poisoned]

desertfire requested changes Nov 18, 2024

View reviewed changes

etaf added 2 commits November 19, 2024 22:27

Update

0d096cf

[ghstack-poisoned]

Update

7115a0d

[ghstack-poisoned]

etaf requested review from desertfire and EikanWang November 20, 2024 07:54

etaf added 2 commits November 20, 2024 00:03

Update

832e51d

[ghstack-poisoned]

Update

12f85ac

[ghstack-poisoned]

pytorchmergebot added Reverted ci-no-td Do not run TD on this PR labels Dec 9, 2024

pytorchmergebot reopened this Dec 9, 2024

desertfire added a commit that referenced this pull request Dec 9, 2024

[AOTI] Fix #140546 and support AOTI package load for Intel GPU.

c32175f

ghstack-source-id: 83c586e Pull Request resolved: #140664

pytorchmergebot added the merging label Dec 9, 2024

pytorchmergebot removed the merging label Dec 9, 2024

desertfire added a commit that referenced this pull request Dec 9, 2024

[AOTI] Fix #140546 and support AOTI package load for Intel GPU.

dad24d3

ghstack-source-id: 59a5add Pull Request resolved: #140664

pytorchmergebot added the merging label Dec 10, 2024

pytorchmergebot closed this in 1cb2ebd Dec 10, 2024

pytorchmergebot removed the merging label Dec 10, 2024

Jack-Khuu mentioned this pull request Dec 13, 2024

Multi Pin Bumps across PT/AO/tune/ET pytorch/torchchat#1367

Merged

Esquains pushed a commit to Esquains/study1 that referenced this pull request Dec 15, 2024

[AOTI] Fix #140546 and support AOTI package load for Intel GPU.

81071ab

ghstack-source-id: c702bf7 Pull Request resolved: pytorch/pytorch#140664

github-actions bot deleted the gh/etaf/63/head branch January 11, 2025 02:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[AOTI] Fix #140546 and support AOTI package load for Intel GPU. #140664

[AOTI] Fix #140546 and support AOTI package load for Intel GPU. #140664

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

	const std::string& cubin_dir) {
	const std::string& kernel_bin_dir) {


		std::vector<at::Tensor> run(
		virtual std::vector<at::Tensor> run(const std::vector<at::Tensor>& inputs);

[AOTI] Fix #140546 and support AOTI package load for Intel GPU. #140664

[AOTI] Fix #140546 and support AOTI package load for Intel GPU. #140664

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/140664

❗ 1 Active SEVs

❌ 1 New Failure

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Merge started

Uh oh!

Merge failed

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Merge started

Uh oh!

Uh oh!