8000 Ensure conj/neg flags are set in destination for CUDA->CPU copies by amjames · Pull Request #147231 · pytorch/pytorch · GitHub
[go: up one dir, main page]

Skip to content

Ensure conj/neg flags are set in destination for CUDA->CPU copies #147231

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: gh/amjames/20/base
Choose a base branch
from

Conversation

amjames
Copy link
Collaborator
@amjames amjames commented Feb 14, 2025

[ghstack-poisoned]
Copy link
pytorch-bot bot commented Feb 14, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/147231

Note: Links to docs will display an error until the docs builds have been completed.

❌ 7 New Failures

As of commit 5610b04 with merge base f2221b2 (image):

NEW FAILURES - The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

amjames added a commit that referenced this pull request Feb 14, 2025
@amjames
Copy link
Collaborator Author
amjames commented Feb 14, 2025

@pytorchbot label "topic: not user facing"

@@ -44,6 +45,20 @@ def test_conj_copy(self, device, dtype):
x1.copy_(xc1)
self.assertEqual(x1, torch.tensor([5 - 1j, 2 - 2j], device=device, dtype=dtype))

@onlyCUDA
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this test onlyCUDA? Couldn't we parameterize it all device backends?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need a cuda tensor to be the source of the copy and a CPU tensor to be the destination. The issue is only present on cross device copies

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amjames For this specific issue sure, but I am just pointing out we could expand test coverage to copy an XPU, TPU, etc. Also a CPU to CPU copy should also be valid, right?

Copy link
Collaborator
@Skylion007 Skylion007 Feb 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or is copy_ not implemented on all devices? Or is that this behavior is buggy on other devices? Or worse, throws an error depending on the support for non-blocking.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also one minor nit: non_blocking could be parameterized. Or at the very least executed as a subtest.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh wait, looks like the current test suite only runs on CPU/CUDA anyway so this is fine.

@amjames amjames requested a review from janeyx99 February 17, 2025 14:41
Skylion007
Skylion007 previously approved these changes Feb 17, 2025
@Skylion007 Skylion007 requested a review from ngimel February 17, 2025 15:30
@Skylion007 Skylion007 dismissed their stale review February 17, 2025 15:31

Actually, this is a pretty PERF sense part of the codebase, so I'll defer

@@ -449,10 +449,10 @@ static void copy_kernel_cuda(TensorIterator& iter, bool non_blocking) {
}

if (iter.tensor(0).is_conj() != iter.tensor(1).is_conj()) {
iter.tensor(0).conj_physical_();
iter.tensor(0)._set_conj(iter.tensor(1).is_conj());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you cannot change conj flag on the destination tensor, because the copy is inplace and you are not allowed to change the attributes that the tensor has. THis is the reason conj_physical has been run here in the first place - why do you think we would do an operation on tensor data if we didn't need it and could just flip a flag?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alight, misunderstood what conj_physical does. Now with the correct understanding of that I don't see anyway for us to ensure that the conjugation happens on the destination after the copy without blocking. It will have to be resolved beforehand.

We could push this into the branch for copies requiring a temporary, where the conjugate is resolved into a temporary which is then the source for the non blocking copy.

why do you think we would do an operation on tensor data if we didn't need it and could just flip a flag

FWIW I thought this might be something we don't allow, but there were no test failures, so thought it could be an oversight.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought there were test cases to check this, but apparently not. We should resolve conj/neg similar to how we resolve dtype via intermediates - the cases with mismatching conj/neg should go through requires_temporaries branch.

[ghstack-poisoned]
amjames added a commit that referenced this pull request Feb 18, 2025
Fixes #146286

ghstack-source-id: 0c44ccf
Pull Request resolved: #147231
if (same_dtype && iter.is_contiguous()) {
// Contiguous same-dtype copies can always use cudaMemcpyAsync
bool same_conj_neg = iter.tensor(0).is_conj() == iter.tensor(1).is_conj() && iter.tensor(0).is_neg() == iter.tensor(0).is_neg();
if (same_dtype && iter.is_contiguous() && same_conj_neg) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (same_dtype && iter.is_contiguous() && same_conj_neg) {
if (same_dtype && iter.is_contiguous() && (same_conj_neg || !non_blocking) {

@@ -340,8 +344,8 @@ static void copy_kernel_cuda(TensorIterator& iter, bool non_blocking) {

// Enable p2p access between devices. (No-op if it involves the CPU)
bool p2p_enabled = maybe_enable_p2p_access(dst_device, src_device);

if (copy_requires_temporaries(iter, p2p_enabled)) {
bool temp_needed = copy_requires_temporaries(iter, p2p_enabled, non_blocking);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dead code

@@ -355,19 +359,17 @@ static void copy_kernel_cuda(TensorIterator& iter, bool non_blocking) {
auto conversion_device = non_blocking ? kCUDA : kCPU;
if (iter.device_type(1) == conversion_device) {
dst_contig = dst.is_contiguous() ? dst : at::empty_like(dst, LEGACY_CONTIGUOUS_MEMORY_FORMAT);
src_contig = iter.tensor(1).to(iter.dtype(0)).expand_as(dst).contiguous();
src_contig = iter.tensor(1).to(iter.dtype(0)).expand_as(dst).contiguous().resolve_conj();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if src is_conj is false and dst is_conj is true? resolve_conj will do nothing, dst_contig.copy_(src_contig) will still require temporaries, you are in an infinite loop.

[ghstack-poisoned]
amjames added a commit that referenced this pull request Mar 14, 2025
Fixes #146286

ghstack-source-id: e990d36
Pull Request resolved: #147231
[ghstack-poisoned]
amjames added a commit that referenced this pull request Mar 14, 2025
Fixes #146286

ghstack-source-id: a23e358
Pull Request resolved: #147231
[ghstack-poisoned]
amjames added a commit that referenced this pull request Mar 14, 2025
Fixes #146286

ghstack-source-id: c042321
Pull Request resolved: #147231
[ghstack-poisoned]
amjames added a commit that referenced this pull request Mar 14, 2025
Fixes #146286

ghstack-source-id: feb4d9b
Pull Request resolved: #147231
Copy link
Contributor

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
If you want the bot to never mark this PR stale again, add the no-stale label.
Stale pull requests will automatically be closed after 30 days of inactivity.

@github-actions github-actions bot added the Stale label May 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants
0