[PP] Allow unused kwargs in ZB path #153498

H-Huang · 2025-05-13T22:21:46Z

Stack from ghstack (oldest at bottom):

This is a fix when an unused kwarg is in the PP stage forward, we try to call torch.autograd.grad() and update its gradients when it shouldn't have gradients. Leading to this error:

[rank3]:[rank3]: File "/data/users/howardhuang/pytorch/torch/distributed/pipelining/stage.py", line 613, in
[rank3]:[rank3]: return lambda: stage_backward_input(
[rank3]:[rank3]: File "/data/users/howardhuang/pytorch/torch/distributed/pipelining/_backward.py", line 199, in stage_backward_input
[rank3]:[rank3]: dinputs = torch.autograd.grad(
[rank3]:[rank3]: File "/data/users/howardhuang/pytorch/torch/autograd/init.py", line 503, in grad
[rank3]:[rank3]: result = _engine_run_backward(
[rank3]:[rank3]: File "/data/users/howardhuang/pytorch/torch/autograd/graph.py", line 824, in _engine_run_backward
[rank3]:[rank3]: return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
[rank3]:[rank3]: RuntimeError: One of the differentiated Tensors does not require grad

related issues: pytorch/torchtitan#1188

cc @awgu @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k

[ghstack-poisoned]

pytorch-bot · 2025-05-13T22:21:50Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/153498

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ghstack-source-id: 7228c9b Pull Request resolved: #153498

related issues: pytorch/torchtitan#1188, #153484 cc awgu wanchaol fegin fduwjj wz337 wconstab d4l3k [ghstack-poisoned]

wip

0f3cd42

[ghstack-poisoned]

H-Huang added a commit that referenced this pull request May 13, 2025

wip

acf056c

ghstack-source-id: 7228c9b Pull Request resolved: #153498

pytorch-bot bot added the oncall: distributed Add this issue/PR to distributed oncall triage queue label May 13, 2025

H-Huang marked this pull request as draft May 13, 2025 22:21

H-Huang added the module: pipelining Pipeline Parallelism label May 13, 2025

This was referenced May 13, 2025

PP Zero Bubble CI tests failure pytorch/torchtitan#1188

Open

Pytorch PP requires all parameters to have grad in backward #153484

Open

H-Huang added the release notes: distributed (pipeline) release notes category label May 14, 2025

Update on "wip"

9f0c1e5

related issues: pytorch/torchtitan#1188, #153484 cc awgu wanchaol fegin fduwjj wz337 wconstab d4l3k [ghstack-poisoned]

H-Huang mentioned this pull request May 14, 2025

[PP] wip, allow grad to be None #153557

Draft

H-Huang changed the title ~~wip~~ [PP] Allow unused kwargs in ZB path May 14, 2025

H-Huang marked this pull request as ready for review May 14, 2025 17:38

H-Huang requested a review from kwen2501 May 14, 2025 19:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PP] Allow unused kwargs in ZB path #153498

[PP] Allow unused kwargs in ZB path #153498

[PP] Allow unused kwargs in ZB path #153498

Are you sure you want to change the base?

[PP] Allow unused kwargs in ZB path #153498

Conversation

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/153498