-
Notifications
You must be signed in to change notification settings - Fork 24.8k
Open
Labels
module: pipeliningPipeline ParallelismPipeline Parallelismoncall: distributedAdd this issue/PR to distributed oncall triage queueAdd this issue/PR to distributed oncall triage queue
Description
🐛 Describe the bug
When running Pytorch PP, the backward step code requires all parameters to have grad. This is too strict and perhaps should relax to at least one parameter requires grad.
def forward(self, param1, param2, ...):
# currently, both param1 and param2 require grad
Error message:
RuntimeError: [7] for chunk 0 has gradients None and is expecting to send gradients to stage 6,
Versions
Pytorch trunk
cc @H-Huang @awgu @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k
awgu
Metadata
Metadata
Assignees
Labels
module: pipeliningPipeline ParallelismPipeline Parallelismoncall: distributedAdd this issue/PR to distributed oncall triage queueAdd this issue/PR to distributed oncall triage queue