8000 Fixes in Imagenet training script (#1224) · pytorch/examples@a848347 · GitHub
[go: up one dir, main page]

Skip to content

Commit a848347

Browse files
authored
Fixes in Imagenet training script (#1224)
Fixes * ngpus_per_node variable count fix * dist_backend assertion added for nccl issue
1 parent 76cd9d0 commit a848347

File tree

1 file changed

+5
-1
lines changed

1 file changed

+5
-1
lines changed

imagenet/main.py

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -106,8 +106,12 @@ def main():
106106

107107
if torch.cuda.is_available():
108108
ngpus_per_node = torch.cuda.device_count()
109+
assert not (ngpus_per_node == 1 and args.dist_backend == "nccl"),\
110+
"nccl backend requires GPU count>1, see https://github.com/NVIDIA/nccl/issues/103 perhaps use 'gloo'"
109111
else:
110-
ngpus_per_node = 1
112+
ngpus_per_node = 0
113+
assert args.dist_backend != "nccl",\
114+
"nccl backend does not work without GPU, see https://pytorch.org/docs/stable/distributed.html"
111115
if args.multiprocessing_distributed:
112116
# Since we have ngpus_per_node processes per node, the total world_size
113117
# needs to be adjusted accordingly

0 commit comments

Comments
 (0)
0