8000 Accuracy issue in torch inductor · Issue #153299 · pytorch/pytorch · GitHub
[go: up one dir, main page]

Skip to content

Accuracy issue in torch inductor #153299

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
naveenkumarmarri opened this issue May 10, 2025< 8000 /relative-time> · 9 comments
Open

Accuracy issue in torch inductor #153299

naveenkumarmarri opened this issue May 10, 2025 · 9 comments
Assignees
Labels
module: inductor oncall: pt2 triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module ubn "unbreak now", our utmost priority label.

Comments

@naveenkumarmarri
Copy link
naveenkumarmarri commented May 10, 2025

🐛 Describe the bug

I am noticing accuracy difference when training with torch compile. to narrow down the issue, ran ablation with eager, aot_eager and inductor and observed that numerics diverge when using backend=inductor.

Looks like this is happening inside inductor and to produce, I added repro.py and minified_launcher.py. let me know if you need any information that can help better reproduce this

Error logs

No response

Versions

ran repro minifer and the script is available here
repro.py - https://gist.github.com/naveenkumarmarri/0f66d89695e56840a06c7a37dccca83f
minifier.py - https://gist.github.com/naveenkumarmarri/8fa35e72e3210a6c6b13548d9ab73df6

cc @chauhang @penguinwu @voznesenskym @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @aakhundov

@naveenkumarmarri naveenkumarmarri changed the title Accuracy issue in torch dynamo Accuracy issue in torch inductor May 10, 2025
@masnesral
Copy link
Contributor

@naveenkumarmarri can you also try with torch._inductor.config.emulate_precision_casts=True

@naveenkumarmarri
Copy link
Author

@masnesral adding the flag still introduces the accuracy gap

@masnesral
Copy link
Contributor

@naveenkumarmarri does the repro script actually work for you? For example: https://gist.github.com/naveenkumarmarri/0f66d89695e56840a06c7a37dccca83f#file-repro-py-L67

@navmarri14
Copy link

this script is autogenerated by enabling

export TORCHDYNAMO_REPRO_AFTER=aot
export TORCHDYNAMO_REPRO_LEVEL=4

I followed the torch.compile docs to get the script. Let me know if there is a better way to generate the repro script that might be helpful to reproducing this

@masnesral
Copy link
Contributor

@navmarri14, I dunno; I'm kind of a noob with minifier support, but the script looks invalid to me. Is this use case using either of post_grad_custom_pre_pass or post_grad_custom_post_pass per chance? I don't think the minifier works if you're using those customizations.

@navmarri14
Copy link
navmarri14 commented May 13, 2025

@masnesral I don't have post_grad_custom_pre_pass or post_grad_custom_post_pass as part of the model. I was able to narrow down the issue to the following operation.

# compiled model leads to mismatch in accuracy
def fn(x): 
  return torch.logsumexp(x, dim=-1).pow(2).mean()

if I disable the compilation on the specific operation, the results match with the uncompiled model.

@torch._dynamo.disable(recursive=True)
def fn(x): 
  return torch.logsumexp(x, dim=-1).pow(2).mean()

I tried to reproduce this in a simple model definition but the results seem to be matching between compiled and uncompiled models.

import torch
import torch.nn as nn

class ToyModel(nn.Module):
    def __init__(self):
        super(ToyModel, self).__init__()
        
    def forward(self, x):
        return torch.logsumexp(x, dim=-1).pow(2).mean()


device = torch.device("cuda")
model = ToyModel().to(device)
x = torch.load("shift_logits.pt", map_location=device)
uncompiled_loss = model(x)
print(f"uncompiled_loss: {uncompiled_loss.item()}")

model = torch.compile(model, fullgraph=False, backend="inductor")
compiled_loss = model(x)
print(f"compiled_loss: {compiled_loss.item()}")
print(f"compiled equal to uncompiled: {torch.equal(uncompiled_loss, compiled_loss)}")

prints

uncompiled_loss: 36.5
compiled_loss: 36.5
compiled equal to uncompiled: True

cc: @ezyang incase if you have seen this issue before or if could share better approach to share repro

@masnesral
Copy link
Contributor

Ok, I was able to get the repro working just by commenting out the bad lines: https://gist.github.com/masnesral/f5c9afeb24247838e7fb7812b1f47bc7

I also ran the compiler bisector, but it didn't find a culprit:

All subsystems in inductor have been checked. The issue is not in this system.
The issue is in the inductor system, but could not identify subsystem.

cc @eellison I think we'd consider this ubn.

@masnesral masnesral added ubn "unbreak now", our utmost priority label. triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels May 14, 2025
@navmarri14
Copy link

@masnesral are there any references on how to fix these kind of issues. I can make a PR if possible

@eellison
Copy link
Contributor

I started looking at this, i can take it.

@eellison eellison self-assigned this May 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module: inductor oncall: pt2 triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module ubn "unbreak now", our utmost priority label.
Projects
None yet
Development

No branches or pull requests

4 participants
0