10000 [Inductor] Fix the lowering of squeeze when input is not contiguous by leslie-fang-intel · Pull Request #146746 · pytorch/pytorch · GitHub
[go: up one dir, main page]

Skip to content

[Inductor] Fix the lowering of squeeze when input is not contiguous #146746

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

leslie-fang-intel
Copy link
Collaborator
@leslie-fang-intel leslie-fang-intel commented Feb 8, 2025

Stack from ghstack (oldest at bottom):

Summary
Fix issue #143498. The issue happens when we lowering select = torch.ops.aten.select.int(cat, 1, 0).

For example, when cat is contiguous with size[2, 2] stride[2,1]

  • for eager, it returns a view of size[2,] stride[2,]
  • for Inductor lowering, it returns wrong stride 1 instead of 2
TensorBox(
  ReinterpretView(
    StorageBox(
      ConcatKernel(name='buf10', layout=FixedLayout('cpu', torch.int64, size=[u0, 2], stride=[2, 1]), inputs=[ComputedBuffer(name='buf8', layout=NonOwningLayout('cpu', torch.int64, size=[u0, 1], stride=[2, 1]), data=Pointwise(device=device(type='cpu'), dtype=torch.int64, inner_fn=<function ReinterpretView.make_loader.<locals>.loader at 0x7f6b856449d0>, ranges=[u0, 1])), ComputedBuffer(name='buf9', layout=NonOwningLayout('cpu', torch.int64, size=[u0, 1], stride=[2, 1]), data=Pointwise(device=device(type='cpu'), dtype=torch.int64, inner_fn=<function ReinterpretView.make_loader.<locals>.loader at 0x7f6b85644790>, ranges=[u0, 1]))])
    ),
    FixedLayout('cpu', torch.int64, size=[u0], stride=[**1**]),
    origins=OrderedSet([select])
  )
)

To fix this issue, we give the right stride when lowering of squeeze.

Test Plan

python -u -m pytest -s -v test/inductor/test_unbacked_symints.py -k test_issue_143498

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov

[ghstack-poisoned]
Copy link
pytorch-bot bot commented Feb 8, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/146746

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 New Failures

As of commit 0e955ea with merge base fa0592b (image):

NEW FAILURES - The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

leslie-fang-intel added a commit that referenced this pull request Feb 8, 2025
@leslie-fang-intel leslie-fang-intel added topic: not user facing topic category ciflow/trunk Trigger trunk jobs on your pull request labels Feb 8, 2025
@leslie-fang-intel leslie-fang-intel marked this pull request as draft February 8, 2025 04:30
[ghstack-poisoned]
leslie-fang-intel added a commit that referenced this pull request Feb 8, 2025
@leslie-fang-intel leslie-fang-intel marked this pull request as ready for review February 9, 2025 02:56
[ghstack-poisoned]
leslie-fang-intel added a commit that referenced this pull request Feb 10, 2025
@sanchitintel

This comment was marked as resolved.

@eellison
Copy link
Contributor

for Inductor lowering, it returns wrong stride 1 instead of 2

What is returning a wrong stride ? The general contract is that intermediary inductor ir can have different strides than eager. if we are to run a fallback operator, we should be re-running it with FakeTensorMode in process_kernel with the correct strides from inductor ir.

I think something is going wrong in that process. We should not need to constrain inductor ir intermediates to exactly match the striding from eager.

Comment on lines 1023 to 1030
return (
(
as_strided(x, new_shape, new_stride, new_offset)
if is_storage_and_layout
else view(x, new_shape)
)
if new_shape != x.get_size()
else x
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not artificially induced as_strided here. The problem for this is the same problem that we had in my existing pr. Which is that x here is not contiguous:

pytorch/torch/_inductor/ir.py

Lines 2776 to 2780 in de964b9

# due to the size_hint's inability to process unbacked SymInts
# TODO: unbacked should not diverge from backed in determining striding
# Need to require contiguous here instead of realize, see:
# https://github.com/pytorch/pytorch/issues/145561
x = ExternKernel.require_contiguous(x)

This is another issue of apis not working well but regardless, if you update it to

x = ExternKernel.require_exact_strides(x, FlexibleLayout.contiguous_strides(x.get_size())) the test passes.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @eellison, thanks for your comment. Understanding that, ExternKernel.require_contiguous only requires the stride order but not the exact stride number which causes this problem. Looking forward to your suggestions of fixing here: should we change x = ExternKernel.require_contiguous(x) to x = ExternKernel.require_exact_strides(x, FlexibleLayout.contiguous_strides(x.get_size()))? or change the implementation of require_contiguous as require_exact_strides.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's do ExternKernel.require_exact_strides(x, FlexibleLayout.contiguous_strides(x.get_size())). we can separately track making this the default

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed. @eellison please take a look again.

[ghstack-poisoned]
leslie-fang-intel added a commit that referenced this pull request Feb 14, 2025
@leslie-fang-intel
Copy link
Collaborator Author

@pytorchbot merge -f "unrelated CI failure"

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

Raymo111 pushed a commit that referenced this pull request Feb 20, 2025
…146746)

**Summary**
Fix issue #143498. The issue happens when we lowering `select = torch.ops.aten.select.int(cat, 1, 0)`.

For example, when `cat` is contiguous with size[2, 2] stride[2,1]

- for eager, it returns a view of size[2,] stride[2,]
- for Inductor lowering, it returns wrong stride 1 instead of 2
```
TensorBox(
  ReinterpretView(
    StorageBox(
      ConcatKernel(name='buf10', layout=FixedLayout('cpu', torch.int64, size=[u0, 2], stride=[2, 1]), inputs=[ComputedBuffer(name='buf8', layout=NonOwningLayout('cpu', torch.int64, size=[u0, 1], stride=[2, 1]), data=Pointwise(device=device(type='cpu'), dtype=torch.int64, inner_fn=<function ReinterpretView.make_loader.<locals>.loader at 0x7f6b856449d0>, ranges=[u0, 1])), ComputedBuffer(name='buf9', layout=NonOwningLayout('cpu', torch.int64, size=[u0, 1], stride=[2, 1]), data=Pointwise(device=device(type='cpu'), dtype=torch.int64, inner_fn=<function ReinterpretView.make_loader.<locals>.loader at 0x7f6b85644790>, ranges=[u0, 1]))])
    ),
    FixedLayout('cpu', torch.int64, size=[u0], stride=[**1**]),
    origins=OrderedSet([select])
  )
)
```

To fix this issue, we give the right stride when lowering of `squeeze`.

**Test Plan**
```
python -u -m pytest -s -v test/inductor/test_unbacked_symints.py -k test_issue_143498
```

Pull Request resolved: #146746
Approved by: https://github.com/jgong5, https://github.com/sanchitintel, https://github.com/eellison
pytorch-bot bot pushed a commit that referenced this pull request Feb 24, 2025
…146746)

**Summary**
Fix issue #143498. The issue happens when we lowering `select = torch.ops.aten.select.int(cat, 1, 0)`.

For example, when `cat` is contiguous with size[2, 2] stride[2,1]

- for eager, it returns a view of size[2,] stride[2,]
- for Inductor lowering, it returns wrong stride 1 instead of 2
```
TensorBox(
  ReinterpretView(
    StorageBox(
      ConcatKernel(name='buf10', layout=FixedLayout('cpu', torch.int64, size=[u0, 2], stride=[2, 1]), inputs=[ComputedBuffer(name='buf8', layout=NonOwningLayout('cpu', torch.int64, size=[u0, 1], stride=[2, 1]), data=Pointwise(device=device(type='cpu'), dtype=torch.int64, inner_fn=<function ReinterpretView.make_loader.<locals>.loader at 0x7f6b856449d0>, ranges=[u0, 1])), ComputedBuffer(name='buf9', layout=NonOwningLayout('cpu', torch.int64, size=[u0, 1], stride=[2, 1]), data=Pointwise(device=device(type='cpu'), dtype=torch.int64, inner_fn=<function ReinterpretView.make_loader.<locals>.loader at 0x7f6b85644790>, ranges=[u0, 1]))])
    ),
    FixedLayout('cpu', torch.int64, size=[u0], stride=[**1**]),
    origins=OrderedSet([select])
  )
)
```

To fix this issue, we give the right stride when lowering of `squeeze`.

**Test Plan**
```
python -u -m pytest -s -v test/inductor/test_unbacked_symints.py -k test_issue_143498
```

Pull Request resolved: #146746
Approved by: https://github.com/jgong5, https://github.com/sanchitintel, https://github.com/eellison
majing921201 pushed a commit to majing921201/pytorch that referenced this pull request Mar 4, 2025
…ytorch#146746)

**Summary**
Fix issue pytorch#143498. The issue happens when we lowering `select = torch.ops.aten.select.int(cat, 1, 0)`.

For example, when `cat` is contiguous with size[2, 2] stride[2,1]

- for eager, it returns a view of size[2,] stride[2,]
- for Inductor lowering, it returns wrong stride 1 instead of 2
```
TensorBox(
  ReinterpretView(
    StorageBox(
      ConcatKernel(name='buf10', layout=FixedLayout('cpu', torch.int64, size=[u0, 2], stride=[2, 1]), inputs=[ComputedBuffer(name='buf8', layout=NonOwningLayout('cpu', torch.int64, size=[u0, 1], stride=[2, 1]), data=Pointwise(device=device(type='cpu'), dtype=torch.int64, inner_fn=<function ReinterpretView.make_loader.<locals>.loader at 0x7f6b856449d0>, ranges=[u0, 1])), ComputedBuffer(name='buf9', layout=NonOwningLayout('cpu', torch.int64, size=[u0, 1], stride=[2, 1]), data=Pointwise(device=device(type='cpu'), dtype=torch.int64, inner_fn=<function ReinterpretView.make_loader.<locals>.loader at 0x7f6b85644790>, ranges=[u0, 1]))])
    ),
    FixedLayout('cpu', torch.int64, size=[u0], stride=[**1**]),
    origins=OrderedSet([select])
  )
)
```

To fix this issue, we give the right stride when lowering of `squeeze`.

**Test Plan**
```
python -u -m pytest -s -v test/inductor/test_unbacked_symints.py -k test_issue_143498
```

Pull Request resolved: pytorch#146746
Approved by: https://github.com/jgong5, https://github.com/sanchitintel, https://github.com/eellison
@github-actions github-actions bot deleted the gh/leslie-fang-intel/180/head branch March 23, 2025 02:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants
0