-
Notifications
You must be signed in to change notification settings - Fork 24.2k
[torch][ao] Do not crash numerics debugger if the shape of the tensors do not match #149330
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/149330
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 New Failure, 1 Unrelated FailureAs of commit c755508 with merge base 5e9f792 ( NEW FAILURE - The following job has failed:
UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This pull request was exported from Phabricator. Differential Revision: D66245053 |
This is a redo of #141257 but with a different author who has permissions |
9225c5d
to
c420608
Compare
…s do not match (pytorch#149330) Summary: Occasionally we see the loss function to crash because the shapes of `a` and `b` tensors are different. This diff avoids crashing is such scenarios and lets the comparison work for other nodes where the shapes match. Test Plan: - CI Reviewed By: jerryzh168 Differential Revision: D66245053
This pull request was exported from Phabricator. Differential Revision: D66245053 |
c420608
to
b98c1a4
Compare
…s do not match (pytorch#149330) Summary: Occasionally we see the loss function to crash because the shapes of `a` and `b` tensors are different. This diff avoids crashing is such scenarios and lets the comparison work for other nodes where the shapes match. Test Plan: - CI Reviewed By: jerryzh168 Differential Revision: D66245053
b98c1a4
to
454c61d
Compare
…s do not match (pytorch#149330) Summary: Occasionally we see the loss function to crash because the shapes of `a` and `b` tensors are different. This diff avoids crashing is such scenarios and lets the comparison work for other nodes where the shapes match. Test Plan: - CI Reviewed By: jerryzh168 Differential Revision: D66245053
This pull request was exported from Phabricator. Differential Revision: D66245053 |
1 similar comment
This pull request was exported from Phabricator. Differential Revision: D66245053 |
454c61d
to
6ee86c4
Compare
…s do not match (pytorch#149330) Summary: Pull Request resolved: pytorch#149330 Occasionally we see the loss function to crash because the shapes of `a` and `b` tensors are different. This diff avoids crashing is such scenarios and lets the comparison work for other nodes where the shapes match. Test Plan: - CI Reviewed By: jerryzh168 Differential Revision: D66245053
…s do not match (pytorch#149330) Summary: Pull Request resolved: pytorch#149330 Occasionally we see the loss function to crash because the shapes of `a` and `b` tensors are different. This diff avoids crashing is such scenarios and lets the comparison work for other nodes where the shapes match. Test Plan: - CI Reviewed By: jerryzh168 Differential Revision: D66245053
6ee86c4
to
c755508
Compare
@pytorchbot merge |
This PR needs to be approved by an authorized maintainer before merge. |
@jerryzh168 can you stamp this? I need an authorized reviewer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what are those cases? can you add a test btw?
This wasn't my diff, I commandeered it from someone who didn't have permissions to land, it was previously accepted internally. This was mostly to avoid a very large exception message relating to printing tensors instead of their shapes |
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Merge failedReason: 1 mandatory check(s) failed. The first few are: Dig deeper by viewing the failures on hud |
I see, then should there be a raise Excpetion under the if statement |
"Cannot compare tensors with different shapes: actual=%s vs ref=%s", | ||
a.shape, | ||
b.shape, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we raise exception here instead of ignore the error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so, because this type of error would typically indicate the debug handle was placed on the wrong nodes to compare before and after. Debug handles should always produce tensors of the same shape.
But debug handle errors should not prevent reporting of other accuracy errors, which an exception would do
if not isinstance(a, torch.Tensor) | ||
or not isinstance(b, torch.Tensor) | ||
or a.shape == b.shape |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also with this, some of the mismatches will be ignored, that's probably not what we want I think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We want to ignore mismatches if they were comparing this of different types, or when the tensors have different shapes, because it's a problem with the debug handles at that point
Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as |
Summary: Occasionally we see the loss function to crash because the shapes of
a
andb
tensors are different. This diff avoids crashing is such scenarios and lets the comparison work for other nodes where the shapes match.Test Plan: - CI
Reviewed By: jerryzh168