8000 [Inductor] Fix the High Order Op layout issue by leslie-fang-intel · Pull Request #128275 · pytorch/pytorch · GitHub
[go: up one dir, main page]

Skip to content

[Inductor] Fix the High Order Op layout issue #128275

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

leslie-fang-intel
Copy link
Collaborator
@leslie-fang-intel leslie-fang-intel commented Jun 8, 2024

Stack from ghstack (oldest at bottom):

Fix the issue: #127995

  • In current implementation of creating FallbackKernel, the device of the NoneLayout is set to None when example_output returns from cls.process_kernel is None.

    pytorch/torch/_inductor/ir.py

    Lines 5632 to 5649 in 921aa19

    with context:
    (
    example_output,
    tensor_args,
    non_tensor_args,
    unflatten_args,
    unbacked_bindings,
    ) = cls.process_kernel(kernel, *args, **kwargs)
    if example_output is None:
    packed = cls(
    NoneLayout(None),
    kernel,
    tensor_args,
    non_tensor_args,
    unflatten_args,
    unbacked_bindings=unbacked_bindings,
    )
  • If a ExternalKernel schedulerNode has None device, the previous buffer will not flush before codegen this ExternalKernel schedulerNode which causes the wrong generated code.
    if not isinstance(node, NopKernelSchedulerNode) and (
    device := node.get_device()
    ):
    if (
    device != self.current_device
    or node.is_extern()
    or node.is_template()
    ):
    self.flush()

Test Plan

python -u -m pytest -s -v test/higher_order_ops/test_with_effects.py -k test_compile_inductor_external_op_return_none

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang

Copy link
pytorch-bot bot commented Jun 8, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/128275

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (3 Unrelated Failures)

As of commit 9d761e1 with merge base a3af32c (image):

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@leslie-fang-intel leslie-fang-intel added the ciflow/trunk Trigger trunk jobs on your pull request label Jun 8, 2024
@leslie-fang-intel leslie-fang-intel added this to the 2.4.0 milestone Jun 8, 2024
Fix the issue: #127995

- In current implementation, the device of the `NoneLayout` will be None when `example_output` returns from `cls.process_kernel` is None. The test reported in the issue is the case when `ExternalKernel` returns None. https://github.com/pytorch/pytorch/blob/921aa194c77f5279b15415eaa213813ddcdb3b29/torch/_inductor/ir.py#L5632-L5649
- If a `ExternalKernel schedulerNode` has None device, the previous buffer will not flush before codegen this `ExternalKernel schedulerNode`  which causes the wrong generated code.
https://github.com/pytorch/pytorch/blob/ef2b5ed500cba0b8b2bf04e6006a0d64c910f440/torch/_inductor/scheduler.py#L2701-L2709

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang

[ghstack-poisoned]
leslie-fang-intel added a commit that referenced this pull request Jun 8, 2024
ghstack-source-id: 39c641c
Pull Request resolved: #128275
@zou3519 zou3519 requested a review from eellison June 10, 2024 15:17
Copy link
Contributor
@eellison eellison left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, one comment

@@ -198,6 +198,38 @@ def f(x):
res = torch.compile(f, backend="inductor")(*inputs)
self.assertTrue(torch.allclose(res, f(*inputs)))

@unittest.skipIf(IS_WINDOWS, "triton")
@unittest.skipIf(TEST_WITH_ROCM, "triton")
@unittest.skipIf(_get_torch_cuda_version() >= (11, 7), "triton")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't test cuda. why are skipping here ? also, you can test directly torch.utils._triton import has_triton

Copy link
Collaborator Author
@leslie-fang-intel leslie-fang-intel Jun 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove it and let's see what‘s CI think about. I just copy these skip from UT test_register_effectful_custom_op in this same file. cc @zou3519 @angelayi Do you know any corner case for test_register_effectful_custom_op to skip the triton testing?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I need to skip this test on windows, the preCI reports error in X win-vs2019-cpu-py3 / test (default, 1, 3, windows.4xlarge.nonephemeral) as

2024-06-14T03:11:36.4507249Z torch._dynamo.exc.BackendCompilerFailed: backend='inductor' raised:
2024-06-14T03:11:36.4508166Z InvalidCxxCompiler: No working C++ compiler found in torch._inductor.config.cpp.cxx: (None, 'g++')

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you just need to skip on windows

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, skip this test on Windows.

if example_output is None:
packed = cls(
NoneLayout(None),
NoneLayout(device),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that we're propagating device here, can we remove some of the if device is not None checks from the original pr:

493478d#diff-d2539c9c8dc6a3d7e457767a880612e96d3c85752a77ead49a9e4e00a3e4c3c7R2443

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the implementation of FallbackKernel.find_device,

pytorch/torch/_inductor/ir.py

Lines 5455 to 5471 in 685fcfb

def find_device(tensor_args, example_output):
if tensor_args:
devices = [arg.get_device() for arg in tensor_args if arg.get_device()]
return devices[0]
if isinstance(example_output, torch.Tensor):
return example_output.device
if isinstance(example_output, (list, tuple)):
device_set = {FallbackKernel.find_device(None, x) for x in example_output}
# Remove None
devices = [device for device in device_set if device]
if len(devices) == 1:
return devices[0]
for device in devices:
if is_gpu(device.type):
return device
return devices[0]
return None
, I think the device may still be None.

Fix the issue: #127995

- In current implementation of creating `FallbackKernel`, the `device` of the `NoneLayout` is set to `None` when `example_output` returns from `cls.process_kernel` is `None`. https://github.com/pytorch/pytorch/blob/921aa194c77f5279b15415eaa213813ddcdb3b29/torch/_inductor/ir.py#L5632-L5649
- If a `ExternalKernel schedulerNode` has None device, the previous buffer will not flush before codegen this `ExternalKernel schedulerNode`  which causes the wrong generated code.
https://github.com/pytorch/pytorch/blob/ef2b5ed500cba0b8b2bf04e6006a0d64c910f440/torch/_inductor/scheduler.py#L2701-L2709

**Test Plan**
```
python -u -m pytest -s -v test/higher_order_ops/test_with_effects.py -k test_compile_inductor_external_op_return_none
```

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang

[ghstack-poisoned]
Fix the issue: #127995

- In current implementation of creating `FallbackKernel`, the `device` of the `NoneLayout` is set to `None` when `example_output` returns from `cls.process_kernel` is `None`. https://github.com/pytorch/pytorch/blob/921aa194c77f5279b15415eaa213813ddcdb3b29/torch/_inductor/ir.py#L5632-L5649
- If a `ExternalKernel schedulerNode` has None device, the previous buffer will not flush before codegen this `ExternalKernel schedulerNode`  which causes the wrong generated code.
https://github.com/pytorch/pytorch/blob/ef2b5ed500cba0b8b2bf04e6006a0d64c910f440/torch/_inductor/scheduler.py#L2701-L2709

**Test Plan**
```
python -u -m pytest -s -v test/higher_order_ops/test_with_effects.py -k test_compile_inductor_external_op_return_none
```

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang

[ghstack-poisoned]
Fix the issue: #127995

- In current implementation of creating `FallbackKernel`, the `device` of the `NoneLayout` is set to `None` when `example_output` returns from `cls.process_kernel` is `None`. https://github.com/pytorch/pytorch/blob/921aa194c77f5279b15415eaa213813ddcdb3b29/torch/_inductor/ir.py#L5632-L5649
- If a `ExternalKernel schedulerNode` has None device, the previous buffer will not flush before codegen this `ExternalKernel schedulerNode`  which causes the wrong generated code.
https://github.com/pytorch/pytorch/blob/ef2b5ed500cba0b8b2bf04e6006a0d64c910f440/torch/_inductor/scheduler.py#L2701-L2709

**Test Plan**
```
python -u -m pytest -s -v test/higher_order_ops/test_with_effects.py -k test_compile_inductor_external_op_return_none
```

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang

[ghstack-poisoned]
Fix the issue: #127995

- In current implementation of creating `FallbackKernel`, the `device` of the `NoneLayout` is set to `None` when `example_output` returns from `cls.process_kernel` is `None`. https://github.com/pytorch/pytorch/blob/921aa194c77f5279b15415eaa213813ddcdb3b29/torch/_inductor/ir.py#L5632-L5649
- If a `ExternalKernel schedulerNode` has None device, the previous buffer will not flush before codegen this `ExternalKernel schedulerNode`  which causes the wrong generated code.
https://github.com/pytorch/pytorch/blob/ef2b5ed500cba0b8b2bf04e6006a0d64c910f440/torch/_inductor/scheduler.py#L2701-L2709

**Test Plan**
```
python -u -m pytest -s -v test/higher_order_ops/test_with_effects.py -k test_compile_inductor_external_op_return_none
```

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang

[ghstack-poisoned]
Fix the issue: #127995

- In current implementation of creating `FallbackKernel`, the `device` of the `NoneLayout` is set to `None` when `example_output` returns from `cls.process_kernel` is `None`. https://github.com/pytorch/pytorch/blob/921aa194c77f5279b15415eaa213813ddcdb3b29/torch/_inductor/ir.py#L5632-L5649
- If a `ExternalKernel schedulerNode` has None device, the previous buffer will not flush before codegen this `ExternalKernel schedulerNode`  which causes the wrong generated code.
https://github.com/pytorch/pytorch/blob/ef2b5ed500cba0b8b2bf04e6006a0d64c910f440/torch/_inductor/scheduler.py#L2701-L2709

**Test Plan**
```
python -u -m pytest -s -v test/higher_order_ops/test_with_effects.py -k test_compile_inductor_external_op_return_none
```

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang

[ghstack-poisoned]
leslie-fang-intel added a commit that referenced this pull request Jun 14, 2024
ghstack-source-id: 9fa09db
Pull Request resolved: #128275
pytorch-bot bot pushed a commit that referenced this pull request Jun 14, 2024
Differential Revision: D58590076
@leslie-fang-intel
Copy link
Collaborator Author

Hi @zou3519 @angelayi, looks like there is a import of this PR in 31df0c6, please let me know when you think it's ok to land this PR.

@angelayi
Copy link
Contributor

@leslie-fang-intel Yes! It is fine to land this PR!

@leslie-fang-intel
Copy link
Collaborator Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@zou3519
Copy link
Contributor
zou3519 commented Jun 17, 2024

@leslie-fang-intel thank you for the fix. I'll submit this for cherry-pick onto the 2.4 branch

zou3519 pushed a commit that referenced this pull request Jun 17, 2024
Fix the issue: #127995

- In current implementation of creating `FallbackKernel`, the `device` of the `NoneLayout` is set to `None` when `example_output` returns from `cls.process_kernel` is `None`. https://github.com/pytorch/pytorch/blob/921aa194c77f5279b15415eaa213813ddcdb3b29/torch/_inductor/ir.py#L5632-L5649
- If a `ExternalKernel schedulerNode` has None device, the previous buffer will not flush before codegen this `ExternalKernel schedulerNode`  which causes the wrong generated code.
https://github.com/pytorch/pytorch/blob/ef2b5ed500cba0b8b2bf04e6006a0d64c910f440/torch/_inductor/scheduler.py#L2701-L2709

**Test Plan**
```
python -u -m pytest -s -v test/higher_order_ops/test_with_effects.py -k test_compile_inductor_external_op_return_none
```

Pull Request resolved: #128275
Approved by: https://github.com/eellison
atalman pushed a commit that referenced this pull request Jun 19, 2024
Fix the issue: #127995

- In current implementation of creating `FallbackKernel`, the `device` of the `NoneLayout` is set to `None` when `example_output` returns from `cls.process_kernel` is `None`. https://github.com/pytorch/pytorch/blob/921aa194c77f5279b15415eaa213813ddcdb3b29/torch/_inductor/ir.py#L5632-L5649
- If a `ExternalKernel schedulerNode` has None device, the previous buffer will not flush before codegen this `ExternalKernel schedulerNode`  which causes the wrong generated code.
https://github.com/pytorch/pytorch/blob/ef2b5ed500cba0b8b2bf04e6006a0d64c910f440/torch/_inductor/scheduler.py#L2701-L2709

**Test Plan**
```
python -u -m pytest -s -v test/higher_order_ops/test_with_effects.py -k test_compile_inductor_external_op_return_none
```

Pull Request resolved: #128275
Approved by: https://github.com/eellison

Co-authored-by: leslie-fang-intel <leslie.fang@intel.com>
@github-actions github-actions bot deleted the gh/leslie-fang-intel/111/head branch July 18, 2024 01:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants
0