Fix one_hot inconsistent errors after compile #146466

zeshengzong · 2025-02-05T02:47:27Z

Test Result

>>> import torch
>>> f = torch.nn.functional.one_hot
>>> a = torch.arange(0, 5) % 3  # [0,1,2,0,1]
>>> num_classes = 0
>>> torch.nn.functional.one_hot(a,num_classes)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
RuntimeError: Class values must be smaller than num_classes.

>>> torch.compile(torch.nn.functional.one_hot)(a,num_classes)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/zong/code/pytorch/torch/_dynamo/eval_frame.py", line 570, in _fn
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/home/zong/code/pytorch/torch/_dynamo/external_utils.py", line 48, in inner
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
RuntimeError: Class values must be smaller than num_classes.

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @chenyang78 @kadeng @chauhang @amjames @bdhirsh

pytorch-bot · 2025-02-05T02:47:31Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/146466

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 1 Unrelated Failure

As of commit fa04893 with merge base de68ddc ():

NEW FAILURE - The following job has failed:

pull / linux-focal-py3_9-clang9-xla / test (xla, 1, 1, lf.ephemeral.linux.12xlarge) (gh)
test_one_hot_no_fallback

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

pull / linux-jammy-py3-clang12-executorch / test (executorch, 1, 1, lf.ephemeral.linux.2xlarge) (gh) (#144480)
backends/xnnpack/test/passes/test_convert_to_linear.py::TestConvertToLinear::test_fp32_convert_to_linear

This comment was automatically generated by Dr. CI and updates every 15 minutes.

zeshengzong · 2025-02-05T06:26:48Z

@pytorchbot label "topic: not user facing"

zou3519

looks good if the tests pass but i wonder if the reason why the error checks weren't there was because they aren't compile friendly

zeshengzong · 2025-02-10T01:29:45Z

Is the behavior of compile should consistent with eager mode about param check? There're some other issues about this kind of difference, like #144183. But if not raise error, users may not be aware about wrong result they get. Thanks! :D

bdhirsh · 2025-02-10T15:49:37Z

@zeshengzong can you add a test? The simplest place would probably be in test_repros.py - you can include a basic test that tries to compile one_hot with an invalid num_classes and assert that it raises an error

bdhirsh · 2025-02-10T15:50:29Z

(the extra checks here look compile friendly - the decomp itself looks like it is already not very compile friendly, since it induces a h2d sync through the .item() call - but the check itself is just an assert on top of that)

zeshengzong · 2025-02-11T03:20:12Z

Test case added, thanks!

zeshengzong · 2025-02-13T07:24:57Z

@bdhirsh Hello, is there a way to test FakeTensor in aten? Might need skip checks about it. Thanks!

bdhirsh · 2025-02-13T16:58:57Z

aten/src/ATen/native/Onehot.cpp

-          num_classes = self.max().item().toLong() + 1;
+            num_classes = self.max().item().toLong() + 1;
+        } else {
+            check_num_classes(self, num_classes);


@zeshengzong the test failures are probably coming from the fact that there are places where you code now calls .item() where we used to not.

Under compile, we shouldn't try to raise the error here if computing the error requires a h2d sync (via the item() call). So the first thing you're going to have to do is tweak this condition to only run in the same cases that item() was used before.

From looking at the code, I think.... we can always do the num_classes should be positive check, but we can only do the Class values must be smaller than num_classes check if the num_classes field has already been computed

zeshengzong · 2025-03-26T07:37:38Z

@pytorchbot rebase -b main

pytorchmergebot · 2025-03-26T07:39:07Z

@pytorchbot started a rebase job onto refs/remotes/origin/main. Check the current status here

pytorchmergebot · 2025-03-26T07:39:11Z

Successfully rebased fix/aten/one_hot onto refs/remotes/origin/main, please pull locally before adding more changes (for example, via git checkout fix/aten/one_hot && git pull --rebase)

shaoyuyoung · 2025-05-14T02:22:52Z

Hi, any updates on this fix?

pytorchbot added the open source label Feb 5, 2025

zeshengzong force-pushed the fix/aten/one_hot branch from c95e6c3 to 27f709b Compare February 5, 2025 06:25

zeshengzong marked this pull request as ready for review February 5, 2025 06:26

pytorch-bot bot added the topic: not user facing topic category label Feb 5, 2025

zou3519 reviewed Feb 7, 2025

View reviewed changes

cpuhrsch added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Feb 8, 2025

cpuhrsch requested a review from bdhirsh February 8, 2025 01:37

zeshengzong force-pushed the fix/aten/one_hot branch from 27f709b to c5b5f25 Compare February 11, 2025 03:19

pytorch-bot bot added the module: dynamo label Feb 11, 2025

bdhirsh approved these changes Feb 11, 2025

View reviewed changes

bdhirsh reviewed Feb 13, 2025

View reviewed changes

zeshengzong force-pushed the fix/aten/one_hot branch 2 times, most recently from 26fc162 to 0bfac06 Compare February 14, 2025 07:57

zeshengzong added 3 commits March 26, 2025 07:39

Fix one_hot inconsistent errors after compile

abf421b

Update

086da2f

Update

fa04893

pytorchmergebot force-pushed the fix/aten/one_hot branch from 0bfac06 to fa04893 Compare March 26, 2025 07:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix one_hot inconsistent errors after compile #146466

Fix one_hot inconsistent errors after compile #146466

Fix one_hot inconsistent errors after compile #146466

Are you sure you want to change the base?

Fix one_hot inconsistent errors after compile #146466

Conversation

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/146466

❌ 1 New Failure, 1 Unrelated Failure

Choose a reason for hiding this comment

Choose a reason for hiding this comment