Improvements for associative_scan - slicing of xs #138858

bohnstingl · 2024-10-24T22:21:01Z

In this PR, the combine_fn is consistently called with a slice along the scan dim. It implements part of #136966

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang @aakhundov @rec @ydwu4

pytorch-bot · 2024-10-24T22:21:05Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/138858

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit 9e01fff with merge base fb36daa ():

FLAKY - The following job failed but was likely due to flakiness present on trunk:

pull / cuda12.1-py3.10-gcc9-sm75 / test (pr_time_benchmarks, 1, 1, linux.g4dn.metal.nvidia.gpu) (gh) (similar failure)
ModuleNotFoundError: No module named 'boto3'

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ydwu4

Looks good overall.

Can we clean up the comments in test files so that all of them are up-to-date.
The tests have similar structure, can we come up with a _run_test method like what we did in test/inductor/test_control_flow.py? And it's ok to make exceptions for non-standard tests like those test raising errors. This would make things much more cleaner and easy to maintain/add new tests.

torch/_dynamo/variables/higher_order_ops.py

torch/_higher_order_ops/associative_scan.py

ydwu4 · 2024-10-24T22:52:03Z

torch/_higher_order_ops/associative_scan.py

    if len(leaves) != len(out_leaves):
        raise RuntimeError(
            "The number of leaves of the pytree of the output of the operator needs to match the length of the pytree of the input"
        )
-    if any(x.shape != shape for x in out_leaves):
+    if any(x.shape != sliced_shape for x in out_leaves):


We probably should also check strides, require_grads, device, dtype

Done, I added checks for strides, device and dtype. The checks for requires_grad will be added in the PR that contains the Autograd

torch/_higher_order_ops/associative_scan.py

test/functorch/test_control_flow.py

ydwu4 · 2024-10-24T23:29:48Z

test/functorch/test_control_flow.py

+            reverse=reverse,
+        )
+
+        self.assertEqual(result[1], expected_result[1])


what happens for result[0]

Good catch, I only checked the result[1] in that version, but in the new commit, all results are checked

test/functorch/test_control_flow.py

torch/_higher_order_ops/associative_scan.py

bohnstingl · 2024-10-25T16:08:25Z

@pytorchbot label "topic: not user facing"

…e xs

bohnstingl · 2024-10-26T23:00:14Z

Thank you @ydwu4 for the review. I integrated the comments placed here.
In particular, the new test implementation with the _run_tests function resulted in quite a bit of code de-duplication and indeed the tests are much cleaner now.

ydwu4 · 2024-10-28T02:22:47Z

torch/_higher_order_ops/associative_scan.py

+        or x.device != x_sliced.device
+        or x.stride() != x_sliced.stride()
+        for x, x_sliced in zip(out_leaves, sliced_leaves)
+    ):
        raise RuntimeError(
            "The pytree of the output of the operator needs to match the xs pytree"


The error message should make it clear what's not matched. A native way is we could split them into 4 ifs.

I replaced the generic RuntimeError with an Error that provides details about the metadata of the tensors. In particular,

raise RuntimeError( f"The metadata of the output of the operator needs to match the meta data of the xs pytree" f"\n xs metadata : {[(x.shape, x.dtype, x.device, x.stride()) for x in sliced_leaves]}" f"\n operator output metadata: {[(x.shape, x.dtype, x.device, x.stride()) for x in out_leaves]}" )

ydwu4 · 2024-10-28T02:25:24Z

torch/_higher_order_ops/associative_scan.py

+        # Therefore, the paralellization is realized with vmap on `dim`
+        combine_fn = functools.partial(
+            wrap_combine_fn_flat,
+            combine_fn=torch.vmap(combine_fn, dim, dim),


btw, does this also vmap over the additional inputs (we shouldn't, right?).

No, we cannot. This is actually a problem that we need to tackle once additional inputs are supported. Do you have a better idea?

We could set the in_dim of the additional_inputs to be None.

True, I will add a TODO for me and keep it in mind for the Autograd implementation

ydwu4 · 2024-10-28T02:30:32Z

torch/_higher_order_ops/associative_scan.py

    if combine_mode == "generic":
+        # The generic_associative_scan implementation calls the combine_fn with a batch long the scan dimension


I think the main difficult part for me is to understand how vmap interacts with the recursive call in generic_associative_scan. Maybe can use a 2-D or 3-D tensor inputs to illustrate how it works.

I added an example for clarification

test/functorch/test_control_flow.py

ydwu4 · 2024-10-28T02:40:17Z

test/functorch/test_control_flow.py

+        torch._dynamo.reset()
+        super().setUp()
+              


            
+    def _run_test(self, model, inputs, **kwargs):


Somehow I feel the interface of _run_test difficult to understand. Specifically, I don't feel comfortable when copying existing tests.

Probably one thing we can do is to write down the kwargs explicitly: e.g., make combine_mode, dim, reverse, compile_mode, combine_fn, as explicit kwargs. Should they just be initialized in the model we're going to run?

We can remove the fake_combine_fn and make that test just be a standalone test. Because I don't know when should I provide one.

We discussed offline about the interface change of _run_test and I incorporated it accordingly. Let me know what you think.

test/functorch/test_control_flow.py

ydwu4 · 2024-10-28T02:47:03Z

test/functorch/test_control_flow.py

+    # @parametrize("compile_mode", ["none", "eager", "compile", "compile_dynamic_shape"])
+    # @parametrize("reverse", [False, True])
+    # @parametrize("device", [torch.device("cpu"), torch.device("cuda")])
+    @unittest.expectedFailure


Left a TODO here?

There is an issue with using map inside the associative_scan. We discussed this offline and I left a TODO

ydwu4

Looking good! Left a few comments. wait for ci to pass.

test/functorch/test_control_flow.py

ydwu4 · 2024-11-05T19:23:37Z

@pytorchbot merge

pytorchmergebot · 2024-11-05T19:25:40Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

In this PR, the combine_fn is consistently called with a slice along the scan dim. It implements part of pytorch#136966 Pull Request resolved: pytorch#138858 Approved by: https://github.com/ydwu4

bohnstingl requested a review from zou3519 as a code owner October 24, 2024 22:21

pytorch-bot bot added the module: dynamo label Oct 24, 2024

pytorchbot added the open source label Oct 24, 2024

ydwu4 reviewed Oct 24, 2024

View reviewed changes

Skylion007 reviewed Oct 25, 2024

View reviewed changes

torch/_higher_order_ops/associative_scan.py Outdated Show resolved Hide resolved

Skylion007 reviewed Oct 25, 2024

View reviewed changes

torch/_higher_order_ops/associative_scan.py Outdated Show resolved Hide resolved

Skylion007 reviewed Oct 25, 2024

View reviewed changes

torch/_higher_order_ops/associative_scan.py 8000 Outdated Show resolved Hide resolved

Skylion007 reviewed Oct 25, 2024

View reviewed changes

torch/_higher_order_ops/associative_scan.py Outdated Show resolved Hide resolved

pytorch-bot bot added the topic: not user facing topic category label Oct 25, 2024

bohnstingl added 6 commits October 26, 2024 22:59

Ensure that the combine_fn is only called with the proper slice of th…

460f753

…e xs

Fixed shape check

1419a79

WIP: nested associative_scan

944649a

Incorporated first review round

f974cf3

Implemented better and more unified testing procedures

ab0e515

Rebase to main

59b164b

bohnstingl force-pushed the associative_scan_70 branch from 89aa61e to 59b164b Compare October 26, 2024 23:00

Lintrunner cleanup

6dc7811

bohnstingl requested a review from ydwu4 October 26, 2024 23:05

ydwu4 reviewed Oct 28, 2024

View reviewed changes

ezyang added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Oct 28, 2024

WIP: new _run_test interface

308e89c

zou3519 removed their request for review October 29, 2024 20:53

Integrated comments from PR and updated testcases

0a902eb

bohnstingl requested a review from ydwu4 October 30, 2024 21:23

Integrated nested tuple for the vmap used in generic_associative_scan

022a454

ydwu4 approved these changes Oct 31, 2024

View reviewed changes

test/functorch/test_control_flow.py Outdated Show resolved Hide resolved

test/functorch/test_control_flow.py Show resolved Hide resolved

test/functorch/test_control_flow.py Show resolved Hide resolved

test/functorch/test_control_flow.py Outdated Show resolved Hide resolved

Integrated nit changes

8aeef66

pytorch-bot bot added the module: inductor label Oct 31, 2024

Fixed minor issue with testcase parameters

9e01fff

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Nov 5, 2024

pytorchmergebot added the merging label Nov 5, 2024

pytorchmergebot added the Merged label Nov 5, 2024

pytorchmergebot closed this in d1c26b0 Nov 5, 2024

pytorchmergebot removed the merging label Nov 5, 2024

bohnstingl mentioned this pull request Nov 6, 2024

Improvements for associative_scan - Autograd #136966

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improvements for associative_scan - slicing of xs #138858

Improvements for associative_scan - slicing of xs #138858

		if combine_mode == "generic":
		# The generic_associative_scan implementation calls the combine_fn with a batch long the scan dimension

Improvements for associative_scan - slicing of xs #138858

Improvements for associative_scan - slicing of xs #138858

Conversation

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/138858

✅ You can merge normally! (1 Unrelated Failure)

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Merge started