8000 dist.gather_object does not correctly compute rank in coordinator check · Issue #118337 · pytorch/pytorch · GitHub
[go: up one dir, main page]

Skip to content
dist.gather_object does not correctly compute rank in coordinator check #118337
@mvpatel2000

Description

@mvpatel2000

🐛 Describe the bug

my_rank = get_rank()

This line should pass in the group so that it gets the rank in the group instead of the global rank. Otherwise, check's for is_coordinator in sharded checkpoint loading will fail.

Versions

Nightly

cc @ezyang @gchanan @zou3519 @kadeng @mrshenli @pritamdamania87 @zhaojuanmao @satgera @rohan-varma @gqchen @aazzolini @osalpekar @jiayisuse @H-Huang @kwen2501 @awgu @penguinwu @fegin @XilunWu @wanchaol @fduwjj @wz337 @tianyu-l @wconstab @yf225 @LucasLLC

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions

    0