Scheduler Flops refactor #152708

exclamaforte · 2025-05-02T18:05:42Z

This refactors estimate_flops and get_estimated_runtime on scheduler nodes:

New function on BaseSchedulerNode: estimate_flops. Works with all types of ir nodes now, not just ExternalKernels.
Extends get_estimated_runtime to work with non-ExternalKernels.

Prelude to: #149697

Testing:
New unit tests cover functionality.

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov

pytorch-bot · 2025-05-02T18:05:46Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/152708

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit ff251ba with merge base 61dd2a0 ():

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

inductor / unit-test / cuda12.6-py3.10-gcc9-sm86 / test (inductor_cpp_wrapper, 1, 2, lf.ephemeral.linux.g5.4xlarge.nvidia.gpu) (gh) (#152916)
[ FAILED ] AotInductorTest.BasicTestCuda

This comment was automatically generated by Dr. CI and updates every 15 minutes.

eellison · 2025-05-02T19:59:10Z

should fix #147137

eellison

nice, looks good ! just one comment about testing

eellison · 2025-05-02T20:46:18Z

test/inductor/test_inductor_scheduler.py

+            gm = make_fx(op)(*example_inputs, **kwargs)
+            reference_flops = get_total_flops(mode)
+
+            graph = GraphLowering(gm)
+
+            with V.set_graph_handler(graph), V.set_debug_handler(DebugContext()):
+                graph.run(*example_inputs, **kwargs)
+                graph.init_wrapper_code()
+                graph._update_scheduler()
+                scheduler_flops = 0
+                for node in graph.scheduler.nodes:
+                    flops = node.estimate_flops()
+                    scheduler_flops += flops if flops is not None else 0


nit: can we just make this a metric we store on counters ? I would rather we just run torch.compile here.

test/inductor/test_inductor_scheduler.py

torch/_inductor/scheduler.py

torch/_inductor/fx_utils.py

xmfan

Thanks for the extension!

torch/_inductor/scheduler.py

eellison

nice !!

exclamaforte · 2025-05-09T16:26:24Z

@pytorchbot merge

pytorchmergebot · 2025-05-09T16:28:28Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

laithsakka · 2025-05-10T05:20:32Z

Note this PR add the following 10% regression:
{
"mm_loop_inductor_dynamic_gpu": 9.9632850491536,
"mm_loop_inductor_gpu": -5.0718946821206
}
cc @eellison

eellison · 2025-05-13T18:16:31Z

Can we avoid invoking estimate_runtime() when it's not needed ? also can we cache the flops estimation for a particular op and input shapes ?

exclamaforte · 2025-05-13T19:40:53Z

@eellison yeah I think get_estimate_runtime is called for what amounts to some logging code in most cases, which probably shouldn't be happening:
https://github.com/pytorch/pytorch/blob/main/torch/_inductor/compile_fx.py#L1383

pytorch-bot bot added ciflow/inductor module: inductor labels May 2, 2025

exclamaforte mentioned this pull request May 2, 2025

Inductor logging + analysis of torch.profile #149697

Open

exclamaforte force-pushed the exclamaforte/scheduler-flops-refactor branch 2 times, most recently from 9f24018 to da087e3 Compare May 2, 2025 19:39

exclamaforte added release notes: inductor topic: improvements topic category labels May 2, 2025

eellison requested a review from xmfan May 2, 2025 19:56

eellison reviewed May 2, 2025

8000 View reviewed changes

xmfan reviewed May 3, 2025

View reviewed changes

torch/_inductor/scheduler.py Outdated Show resolved Hide resolved

xmfan reviewed May 3, 2025

View reviewed changes

torch/_inductor/fx_utils.py Show resolved Hide resolved

xmfan approved these changes May 3, 2025

View reviewed changes

8000 eellison reviewed May 5, 2025

View reviewed changes

torch/_inductor/scheduler.py Outdated Show resolved Hide resolved

exclamaforte requested a review from eellison May 7, 2025 08:39

exclamaforte added 3 commits May 7, 2025 13:48

refactor scheduler countflops and runtime

1738a2d

add options to torch compile in test

1f59e7f

add groupedschedulernode to get_estimated_runtime

b1d6d0b

exclamaforte force-pushed the exclamaforte/scheduler-flops-refactor branch from 41d36ba to 76ffe99 Compare May 7, 2025 20:51

refactor estimate flops and test

ff251ba

exclamaforte force-pushed the exclamaforte/scheduler-flops-refactor branch from 76ffe99 to ff251ba Compare May 8, 2025 00:33

eellison approved these changes May 9, 2025

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label May 9, 2025

pytorchmergebot added the merging label May 9, 2025

pytorchmergebot added the Merged label May 9, 2025

pytorchmergebot closed this in da0b89b May 9, 2025

pytorchmergebot removed the merging label May 9, 2025

exclamaforte mentioned this pull request May 13, 2025

Add flag _metrics_log_runtime to disable runtime metric logging by default #153506

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scheduler Flops refactor #152708

Scheduler Flops refactor #152708

Scheduler Flops refactor #152708

Scheduler Flops refactor #152708

Conversation

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/152708

✅ You can merge normally! (1 Unrelated Failure)

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Merge started