[BE]: Remove redundant copy in torch chunk shard (#144269)

Skylion007 · pytorchmergebot · commit b5cf8e24604d · 2025-01-06T20:52:49.000Z
Fixes an issue noticed in recent all_gather PR. Some parts of the codebase have a double copy with `clone().contiguous()` which could be fused into a single copy op. Pull Request resolved: #144269 Approved by: https://github.com/awgu
diff --git a/torch/distributed/_shard/sharding_spec/chunk_sharding_spec.py b/torch/distributed/_shard/sharding_spec/chunk_sharding_spec.py
@@ -162,7 +162,9 @@ def shard(
                         narrowed_tensor.detach().clone().resize_(scatter_shape)
                     )
                 else:
-                    tensor_to_scatter = narrowed_tensor.detach().clone().contiguous()
+                    tensor_to_scatter = narrowed_tensor.detach().clone(
+                        memory_format=torch.contiguous_format
+                    )
 
                 tensors_to_scatter[
                     dist.get_group_rank(process_group, remote_global_rank)