[scan] Autograd with partial gradient support #146285

bohnstingl · 2025-02-03T00:36:42Z

This PR introduces the Autograd feature for scan with partial gradient support. It is a combination of the already opened PRs: #135631 and bohnstingl#4

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov @ydwu4

pytorch-bot · 2025-02-03T00:36:46Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/146285

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 Cancelled Job

As of commit ebe8c22 with merge base 4273e5d ():

CANCELLED JOB - The following job was cancelled. Please retry:

trunk / linux-focal-rocm-py3.10 / test (distributed, 1, 1, linux.rocm.gpu.4) (gh)

This comment was automatically generated by Dr. CI and updates every 15 minutes.

bohnstingl · 2025-02-03T00:38:04Z

@pytorchbot label "topic: not user facing"

ydwu4

We're on the right direction. A big concern is readability, we should figure out a way to significantly reduce the complexity around all kinds of masking.

torch/_higher_order_ops/scan.py

ydwu4 · 2025-02-08T00:33:50Z

torch/_higher_order_ops/scan.py

+                combine_fn,
+                False,
+                (
+                    *fw_init,


Do fw_init, fw_xs, fw_additional_inputs always set requires_grad = True? Can add a check before it?

No, not necessarily. The all fw_init, fw_xs and fw_additional_inputs are required even if they don't require a gradient. This is because the joint_graph wouldn't work otherwise. In the revised version we return torch.zeros_like() for the elements that don't require gradients.

Moreover, is there anything from #142518 to consider here, or is there anything blocking CUDA graph support? I hope I replaced all the Nones with torch.zeros_like(), but I may have overlook something.

ydwu4 · 2025-02-08T00:35:59Z

torch/_higher_order_ops/scan.py

+                carried_g_additional_input = args[:num_additional_inputs]
+
+                g_c, g_xs = _extract_carry_and_out(
+                    joint_graph(*args[num_additional_inputs:]), num_init


IIUC, is the overall plan that 1. we trace fw_bw of combine_fn and get fw_bw_graph 2. we trace fw_bw_graph + gradient accumulation logic? Can put it down somewhere upfront and justify why this is necessary.

A big concern about this plan is that it's really hard to understand what's going on. I'm afraid no one is gonna be able to maintain this function in a few months. We should think about ways to improve this.

I tried to simplify this in a new version. The new flow is as follows:

Create the forward and the joint graph of the combine_fn

Retrace the wrapper of the forward graph that returns carries from all iterations, not just from the last iteration

Obtain the gradients from the joint_graph and compute the gradient masks.

Retrace the wrapper for the joint_graph using the masks.

Is this more clear? I have also added some more comments.

ydwu4 · 2025-02-08T00:47:42Z

torch/_higher_order_ops/scan.py

+        outs = [
+            torch.zeros(
+                [num_elems] + list(e.size()),
+                dtype=e.dtype,
+                device=e.device,
+            )
+            for i, e in enumerate(dummy_out)
+        ]
+        idxs = [
+            torch.ones_like(e, dtype=torch.int64).unsqueeze(0)
+            for i, e in enumerate(dummy_out)
+        ]


Is this change necessary for this PR?

I'd think so. This is for storing the temporary outputs of scan

…flip in the frontend

…o scan_autograd22

[ghstack-poisoned]

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames [ghstack-poisoned]

bohnstingl · 2025-04-03T15:20:24Z

@ydwu4 I tried to incorporate all your requests and I think the PR would be ready for another round.

ydwu4

Mainly have some concerns about the organization of the documentation.

torch/_higher_order_ops/scan.py

ydwu4 · 2025-04-03T18:16:44Z

torch/_higher_order_ops/scan.py

+        # The flipping back along the scan dimension is required to get the gradients in the right order for ``xs``
+        g_xs = [torch.flip(elem, [0]) for elem in g_xs]
+
+        # The gradients for additional inputs that are not tensors are replaced with None.


Mention the partial grad handling notes.

I referred to that note.

ydwu4 · 2025-04-03T18:43:08Z

test/functorch/test_control_flow.py

-        scan_fct = compile_mode_helper(scan, compile_mode)
-
-        x = torch.randn(3, 1, 2)
+    @parametrize("reverse", [False, True])


not need to parametrize device i feel

ydwu4 · 2025-04-03T18:43:21Z

test/functorch/test_control_flow.py

-    def test_scan_init_scanned_0(self, compile_mode):
-        scan_fct = compile_mode_helper(scan, compile_mode)
-
+    @parametrize("device", [torch.device("cpu"), torch.device("cuda")])


not need to parametrize device i feel

ydwu4 · 2025-04-03T18:45:45Z

test/functorch/test_control_flow.py

+    ):
+        dim = 1
+        x = torch.randn(3, 10, 7, device=device, requires_grad=autograd)
+        h1 = torch.randn(3, 7, device=device, requires_grad=autograd)


"init_carries_unequal_grad" meaning init and carry have different require_grad? or something else?

ydwu4

Looks good! Left some comments for more clarity of the doc. Also need to fix the test failure.

torch/_higher_order_ops/scan.py

…nit, the xs or the additional_inputs is required

ydwu4 · 2025-04-11T16:13:23Z

@pytorchbot merge

pytorchmergebot · 2025-04-11T16:15:54Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2025-04-11T21:53:35Z

Merge failed

Reason: 1 jobs have failed, first few of them are: trunk / linux-focal-rocm-py3.10 / test (distributed, 1, 1, linux.rocm.gpu.4)

Details for Dev Infra team

Raised by workflow job

ydwu4 · 2025-04-14T16:53:57Z

@pytorchbot merge -i

pytorchmergebot · 2025-04-14T16:55:49Z

Merge started

Your change will be merged while ignoring the following 1 checks: trunk / linux-focal-rocm-py3.10 / test (distributed, 1, 1, linux.rocm.gpu.4)

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

This PR introduces the Autograd feature for scan with partial gradient support. It is a combination of the already opened PRs: pytorch#135631 and bohnstingl#4 Pull Request resolved: pytorch#146285 Approved by: https://github.com/ydwu4 Co-authored-by: Yidi Wu <yidi@meta.com>

bohnstingl added 2 commits February 1, 2025 22:18

WIP: Integration of scan autograd

f6586dd

Introduced autograd for scan with partial gradients support

bda8cb9

bohnstingl requested a review from zou3519 as a code owner February 3, 2025 00:36

This was referenced Feb 3, 2025

[scan] Autograd #135631

Open

[scan] Partial Autograd support bohnstingl/pytorch#4

Open

pytorch-bot bot added the topic: not user facing topic category label Feb 3, 2025

pytorchbot added the open source label Feb 3, 2025

soulitzer added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Feb 5, 2025

Fixed CI test issue with graph

34be923

ydwu4 self-requested a review February 7, 2025 00:51

Merge branch 'main' of github.com:pytorch/pytorch into scan_autograd22

5eadb1e

ydwu4 reviewed Feb 8, 2025

View reviewed changes

zou3519 removed their request for review February 12, 2025 15:32

bohnstingl added 2 commits February 19, 2025 00:54

Integrated code review comments

adfa593

Merge branch 'main' of github.com:pytorch/pytorch into scan_autograd22

7b41bde

bohnstingl requested a review from ydwu4 February 19, 2025 07:57

bohnstingl added 5 commits February 20, 2025 22:10

Fixed type annotation

c2465cf

Removed reverse flag from backend and implemented reverse with torch.…

5bc1d27

…flip in the frontend

Fix of graph in testcase

2786a68

Merge branch 'scan_flip_reverse' of github.com:bohnstingl/pytorch int…

7f48b4f

…o scan_autograd22

Integrated new reverse handling into scan autograd

6717f9b

pytorch-bot bot added the module: dynamo label Feb 26, 2025

bohnstingl and others added 6 commits March 7, 2025 09:48

Merge branch 'main' of github.com:pytorch/pytorch into scan_autograd22

a60d2a8

Fixed issues with testcases and with combine_fn now returning pytrees

0343056

[cond] don't trace fw and bw graph in autograd key

0dda954

[ghstack-poisoned]

Update on "[cond] don't trace fw and bw graph in autograd key"

c11cac4

[ghstack-poisoned]

Update on "[cond] don't trace fw and bw graph in autograd key"

ebb5b57

[ghstack-poisoned]

Update on "[cond] don't trace fw and bw graph in autograd key"

46608f8

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames [ghstack-poisoned]

bohnstingl added 2 commits April 3, 2025 15:52

Revert change to cond.py

d27d50a

Removed unrelated testcases and fixed lint issues

0ee4d01

bohnstingl requested a review from ydwu4 April 3, 2025 15:20

ydwu4 reviewed Apr 3, 2025

View reviewed changes

Integrated review changes for documentation and test cases

66c0ae1

bohnstingl requested a review from ydwu4 April 4, 2025 14:14

ydwu4 approved these changes Apr 7, 2025

View reviewed changes

bohnstingl added 4 commits April 8, 2025 08:42

Added some more comments

c976b5e

Introduced materialize_as_graph for combine_fn and for combine_bw_fn

afa5d96

Merge branch 'main' of github.com:pytorch/pytorch into scan_autograd22

6d67d3e

Added bypass for autograd DispatchKey if no gradient for either the i…

ebe8c22

…nit, the xs or the additional_inputs is required

bohnstingl requested a review from ydwu4 April 11, 2025 07:43

ydwu4 approved these changes Apr 11, 2025

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Apr 11, 2025

pytorchmergebot added the merging label Apr 11, 2025

pytorchmergebot removed the merging label Apr 11, 2025

pytorchmergebot added the merging label Apr 14, 2025

pytorchmergebot added the Merged label Apr 14, 2025

pytorchmergebot closed this in b0bdd76 Apr 14, 2025

pytorchmergebot removed the merging label Apr 14, 2025

bohnstingl mentioned this pull request Apr 15, 2025

[feature request] torch.scan (also port lax.fori_loop / lax.while_loop / lax.associative_scan and hopefully parallelized associative scans) #50688

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[scan] Autograd with partial gradient support #146285

[scan] Autograd with partial gradient support #146285

[scan] Autograd with partial gradient support #146285

[scan] Autograd with partial gradient support #146285

Conversation

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/146285

❌ 1 Cancelled Job

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Merge started

Merge failed

Merge started