[multigraph] use backend specializations in compile_and_call_fx_graph #152601

bobrenjc93 · 2025-05-01T14:14:48Z

Stack from ghstack (oldest at bottom):

The goal of this multigraph work is to enable a compiled region that has a single dynamo trace but multiple backend specializations. This work was inspired by vLLM who does this in a somewhat hacky way where they use a custom backend to capture a dynamo graph and then manually invoke compile_fx multiple times to get specialized graphs.

There's really two parts of this work:

The frontend changes:

we introduce an optional kwarg backend_specializations to mark_dynamic that takes in a list of specializations. I debated other methods including specifying specializations via decorators, but ultimately decided this approach was more harmonious. The big issue with decorators is the difficulty of composing well with the rest of the torch.compile ecosystem including graph breaks, lazy initialization of variable trackers and symbolic variables, etc.

The backend changes (this PR):

We capture the backend_specialization specified in the mark_dynamic API into a SymbolicContext. See changes in /_dynamo/variables/builder.py
After we are done dynamo tracing, we invoke call_user_compiler N + 1 times for N specializations and 1 generic graph. Under the hood this will call compile_fx, which composes nicely with both Async Compile and AOTAutogradCache.
When we have specializations, we install a specialized dispatch function that checks each specialization and dispatches to the first one that matches. If none of the specializations match, we dispatch to the generic graph. I decided to do this over returning N different GuardedCodes since 1) it doesn't pollute the dynamo cache (eg. if you have 8 specializations, you would hit the cache limit) 2) it naturally incorporates the hierarchical lattice structure of the guards since the specializations are always necessarily stricter than the generic region's guards.

I benchmarked this PR stack with #152596 and found around a 50% reduction when dispatching to the specialized regions:

[ghstack-poisoned]

pytorch-bot · 2025-05-01T14:14:52Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/152601

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

CI workflows being skipped on PR

❌ 2 New Failures

As of commit ee88cb9 with merge base 8f54e56 ():

NEW FAILURES - The following jobs have failed:

Lint / Link checks / Lint URLs / linux-job (gh)
RuntimeError: Command docker exec -t 074df34e56314d46031869b111d71e00b5ad74e162357421043e0d30f064e30d /exec failed with exit code 1
pull / linux-focal-py3_9-clang9-xla / test (xla, 1, 1, lf.ephemeral.linux.12xlarge) (gh)
Process completed with exit code 128.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ghstack-source-id: f0e905e Pull Request resolved: #152601

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames [ghstack-poisoned]

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov [ghstack-poisoned]

ghstack-source-id: 866a87d Pull Request resolved: #152601

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov [ghstack-poisoned]

ghstack-source-id: 0e54801 Pull Request resolved: #152601

…ll_fx_graph" The goal of this multigraph work is to enable a compiled region that has a single dynamo trace but multiple backend specializations. This work was inspired by vLLM who does this in a somewhat hacky way where they use a custom backend to capture a dynamo graph and then manually invoke compile_fx multiple times to get specialized graphs. There's really two parts of this work: **The frontend changes:** 1) we introduce an optional kwarg `backend_specializations` to mark_dynamic that takes in a list of specializations. I debated other methods including specifying specializations via decorators, but ultimately decided this approach was more harmonious. The big issue with decorators is the difficulty of composing well with the rest of the torch.compile ecosystem including graph breaks, lazy initialization of variable trackers and symbolic variables, etc. **The backend changes (this PR):** 1) We capture the backend_specialization specified in the mark_dynamic API into a SymbolicContext. See changes in `/_dynamo/variables/builder.py` 2) After we are done dynamo tracing, we invoke `call_user_compiler` N + 1 times for N specializations and 1 generic graph. Under the hood this will call compile_fx, which composes nicely with both Async Compile and AOTAutogradCache. 3) When we have specializations, we install a specialized dispatch function that checks each specialization and dispatches to the first one that matches. If none of the specializations match, we dispatch to the generic graph. I decided to do this over returning N different GuardedCodes since 1) it doesn't pollute the dynamo cache (eg. if you have 8 specializations, you would hit the cache limit) 2) it naturally incorporates the hierarchical lattice structure of the guards since the specializations are always necessarily stricter than the generic region's guards. I benchmarked this PR stack with #152596 and found around a 50% reduction when dispatching to the specialized regions: ![495269647_576053105510082_9189856138964956774_n](https://github.com/user-attachments/assets/66030fed-d62e-4d87-940f-aa13c99b1a73) [ghstack-poisoned]

ghstack-source-id: 18b8a12 Pull Request resolved: #152601

…ll_fx_graph" The goal of this multigraph work is to enable a compiled region that has a single dynamo trace but multiple backend specializations. This work was inspired by vLLM who does this in a somewhat hacky way where they use a custom backend to capture a dynamo graph and then manually invoke compile_fx multiple times to get specialized graphs. There's really two parts of this work: **The frontend changes:** 1) we introduce an optional kwarg `backend_specializations` to mark_dynamic that takes in a list of specializations. I debated other methods including specifying specializations via decorators, but ultimately decided this approach was more harmonious. The big issue with decorators is the difficulty of composing well with the rest of the torch.compile ecosystem including graph breaks, lazy initialization of variable trackers and symbolic variables, etc. **The backend changes (this PR):** 1) We capture the backend_specialization specified in the mark_dynamic API into a SymbolicContext. See changes in `/_dynamo/variables/builder.py` 2) After we are done dynamo tracing, we invoke `call_user_compiler` N + 1 times for N specializations and 1 generic graph. Under the hood this will call compile_fx, which composes nicely with both Async Compile and AOTAutogradCache. 3) When we have specializations, we install a specialized dispatch function that checks each specialization and dispatches to the first one that matches. If none of the specializations match, we dispatch to the generic graph. I decided to do this over returning N different GuardedCodes since 1) it doesn't pollute the dynamo cache (eg. if you have 8 specializations, you would hit the cache limit) 2) it naturally incorporates the hierarchical lattice structure of the guards since the specializations are always necessarily stricter than the generic region's guards. I benchmarked this PR stack with #152596 and found around a 50% reduction when dispatching to the specialized regions: ![495269647_576053105510082_9189856138964956774_n](https://github.com/user-attachments/assets/66030fed-d62e-4d87-940f-aa13c99b1a73) [ghstack-poisoned]

ghstack-source-id: 5c8cc69 Pull Request resolved: #152601

…ll_fx_graph" The goal of this multigraph work is to enable a compiled region that has a single dynamo trace but multiple backend specializations. This work was inspired by vLLM who does this in a somewhat hacky way where they use a custom backend to capture a dynamo graph and then manually invoke compile_fx multiple times to get specialized graphs. There's really two parts of this work: **The frontend changes:** 1) we introduce an optional kwarg `backend_specializations` to mark_dynamic that takes in a list of specializations. I debated other methods including specifying specializations via decorators, but ultimately decided this approach was more harmonious. The big issue with decorators is the difficulty of composing well with the rest of the torch.compile ecosystem including graph breaks, lazy initialization of variable trackers and symbolic variables, etc. **The backend changes (this PR):** 1) We capture the backend_specialization specified in the mark_dynamic API into a SymbolicContext. See changes in `/_dynamo/variables/builder.py` 2) After we are done dynamo tracing, we invoke `call_user_compiler` N + 1 times for N specializations and 1 generic graph. Under the hood this will call compile_fx, which composes nicely with both Async Compile and AOTAutogradCache. 3) When we have specializations, we install a specialized dispatch function that checks each specialization and dispatches to the first one that matches. If none of the specializations match, we dispatch to the generic graph. I decided to do this over returning N different GuardedCodes since 1) it doesn't pollute the dynamo cache (eg. if you have 8 specializations, you would hit the cache limit) 2) it naturally incorporates the hierarchical lattice structure of the guards since the specializations are always necessarily stricter than the generic region's guards. I benchmarked this PR stack with #152596 and found around a 50% reduction when dispatching to the specialized regions: ![495269647_576053105510082_9189856138964956774_n](https://github.com/user-attachments/assets/66030fed-d62e-4d87-940f-aa13c99b1a73) [ghstack-poisoned]

ghstack-source-id: d6027ff Pull Request resolved: #152601

…ll_fx_graph" The goal of this multigraph work is to enable a compiled region that has a single dynamo trace but multiple backend specializations. This work was inspired by vLLM who does this in a somewhat hacky way where they use a custom backend to capture a dynamo graph and then manually invoke compile_fx multiple times to get specialized graphs. There's really two parts of this work: **The frontend changes:** 1) we introduce an optional kwarg `backend_specializations` to mark_dynamic that takes in a list of specializations. I debated other methods including specifying specializations via decorators, but ultimately decided this approach was more harmonious. The big issue with decorators is the difficulty of composing well with the rest of the torch.compile ecosystem including graph breaks, lazy initialization of variable trackers and symbolic variables, etc. **The backend changes (this PR):** 1) We capture the backend_specialization specified in the mark_dynamic API into a SymbolicContext. See changes in `/_dynamo/variables/builder.py` 2) After we are done dynamo tracing, we invoke `call_user_compiler` N + 1 times for N specializations and 1 generic graph. Under the hood this will call compile_fx, which composes nicely with both Async Compile and AOTAutogradCache. 3) When we have specializations, we install a specialized dispatch function that checks each specialization and dispatches to the first one that matches. If none of the specializations match, we dispatch to the generic graph. I decided to do this over returning N different GuardedCodes since 1) it doesn't pollute the dynamo cache (eg. if you have 8 specializations, you would hit the cache limit) 2) it naturally incorporates the hierarchical lattice structure of the guards since the specializations are always necessarily stricter than the generic region's guards. I benchmarked this PR stack with #152596 and found around a 50% reduction when dispatching to the specialized regions: ![495269647_576053105510082_9189856138964956774_n](https://github.com/user-attachments/assets/66030fed-d62e-4d87-940f-aa13c99b1a73) [ghstack-poisoned]

ghstack-source-id: d1b3e0f Pull Request resolved: #152601

…ll_fx_graph" The goal of this multigraph work is to enable a compiled region that has a single dynamo trace but multiple backend specializations. This work was inspired by vLLM who does this in a somewhat hacky way where they use a custom backend to capture a dynamo graph and then manually invoke compile_fx multiple times to get specialized graphs. There's really two parts of this work: **The frontend changes:** 1) we introduce an optional kwarg `backend_specializations` to mark_dynamic that takes in a list of specializations. I debated other methods including specifying specializations via decorators, but ultimately decided this approach was more harmonious. The big issue with decorators is the difficulty of composing well with the rest of the torch.compile ecosystem including graph breaks, lazy initialization of variable trackers and symbolic variables, etc. **The backend changes (this PR):** 1) We capture the backend_specialization specified in the mark_dynamic API into a SymbolicContext. See changes in `/_dynamo/variables/builder.py` 2) After we are done dynamo tracing, we invoke `call_user_compiler` N + 1 times for N specializations and 1 generic graph. Under the hood this will call compile_fx, which composes nicely with both Async Compile and AOTAutogradCache. 3) When we have specializations, we install a specialized dispatch function that checks each specialization and dispatches to the first one that matches. If none of the specializations match, we dispatch to the generic graph. I decided to do this over returning N different GuardedCodes since 1) it doesn't pollute the dynamo cache (eg. if you have 8 specializations, you would hit the cache limit) 2) it naturally incorporates the hierarchical lattice structure of the guards since the specializations are always necessarily stricter than the generic region's guards. I benchmarked this PR stack with #152596 and found around a 50% reduction when dispatching to the specialized regions: ![495269647_576053105510082_9189856138964956774_n](https://github.com/user-attachments/assets/66030fed-d62e-4d87-940f-aa13c99b1a73) [ghstack-poisoned]

ghstack-source-id: 93da8dc Pull Request resolved: #152601

test/inductor/test_torchinductor.py

zou3519 · 2025-05-12T15:19:30Z

torch/_subclasses/meta_utils.py

+                raise RuntimeError(
+                    "Backend specializations are only supported for contiguous tensors."
+                )


what's going on here?

zou3519 · 2025-05-12T15:27:16Z

torch/_dynamo/output_graph.py

+                for specialization in old_fake_mode.shape_env.backend_specializations:
+                    source_index = sources.index(specialization.source)
+                    check_fn_source = inspect.getsource(specialization.check_fn).strip()
+                    check_fn = guards.LAMBDA_GUARD(  # type: ignore[attr-defined]
+                        specialization.check_fn,
+                        [check_fn_source],
+                    )
+
+                    log.debug(
+                        "Compiling backend specialized graph with specialization=%s",
+                        check_fn_source,
+                    )
+
+                    specialized_compiles.append(
+                        (
+                            functools.partial(
+                                lambda idx, args, check_fn=check_fn: check_fn(
+                                    args[idx]
+                                ),
+                                source_index,
+                            ),
+                            self.call_user_compiler(gm, specialization=specialization),


This calls the backend compiler with the (tensor_args,) for this graph and the specialization argument, right?

I'm not sure this is the right design. The (tensor_args,) don't have the same shape as the specialization -- will that be a problem?

An alternative design is that there is some lazy dispatching layer right after Dynamo but before AOTAutograd. Let's say the user calls the following for the first time:

# A x = torch.randn(3) mark_dynamic(x, 0, backend_specializations=[1, 2]) torch.compile(f)(x)

Then this traces out a graph from Dynamo with dynamic shapes.

Then, on future calls to torch.compile:

# B y = torch.randn(1) torch.compile(f)(y)

On seeing a specialized shape for the first time: this skips Dynamo but directly forwards the args (y,) to the backend to compile a graph

# C z = torch.randn(1) torch.compile(f)(z)

On seeing a specialized shape again: this pulls up the graph the backend compiled for said shape.

One way to implement this is:

Let's think about the Dynamo cache as a mapping from guards to a callable

After (A), there is a guard for each of the specializations: {"batch_size==1": call_backend_compile(), "batch_size==2": call_backend_compile(), "batch_size==anything_else": compiled_artifact}

(B) hits the call_backend_compile() function, which will compile a backend function and replace the Dynamo cache entry with {"batch_size==1": compiled_artifact}

Future hits to this guard (e.g. C) will just hit the compiled artifact.

The benefit of the alternative lazy design is that the backend doesn't need to work hard to figure out how to do the specialization: it's almost like calling regular torch.compile again, except it is able to skip Dynamo.

One side effect is that we don't have to impose constraints on the strides (this PR needs to do that because it needs to figure out how to create a FakeTensor, right?)

I think this makes sense. cc @anijain2305 for thoughts as well

There are a few details that we need to think about

We will have multiple cache entries per code object here. For example, our cache size limit is 8, but the specialization here will require us to raise cache size limit for certain code objects.

Dynamo cache as a mapping from guards to a callable - This is true, but there is a subtle difference. Dynamo does guards to bytecode mapping. This bytecode contains the call to the compiled_graph (not Fx graph, a compiled graph). So in this design, we will have to figure out how to (1) stash the bytecode, and (2) stash the Dynamo graph.

Overwriting cache entry is also questionable.

Maybe we have the bytecode that calls the backend_compile. And then the backend_compile internally checks if there is a compiled code. If yes, then run the compiled code, otherwise run the AOT + Inductor compilation.

@anijain2305 thoughts on #153449 ?

What is the issue with the current implementation? Its not bad. It gives the hierarchical feel, which kind of makes sense in this case.

Consider the following code:

x = torch.randn(3) mark_dynamic(x, 0, backend_specializations=[1, 2]) torch.compile(f)(x) x = torch.randn(1) torch.compile(f)(x) x = torch.randn(2) torch.compile(f)(x)

On the first torch.compile call, we will attempt to compile all of the backend specializations. That torch.compile call only has one set of sample inputs (of shape [3]). The problems I'm worried about is:
a) Compile time will be slow up front. On the first torch.compile it looks like we call the backend compiler three times.
b) Because there are no real tensor inputs of shape [1] and shape [2], we need to guess at those tensors and assume that they're contiguous. This doesn't seem very good

The lazier design (#153449) solves this by (a) deferring compilation of shape [1] and shape [2] until we actually see inputs of those shapes and (b) if the strides change then it's a recompile

eellison

agree with @zou3519 comments

eellison · 2025-05-12T16:26:19Z

test/inductor/test_torchinductor.py

+            backend_specializations=[
+                (16, lambda x0: x0 == 16),
+            ],


Do we have an api flow for when you want to specify conditions on multiple vars?

E.g.

lambda x, y: x == 1 and y == 1 lambda: x, y: x % 16 and y % 16

You dont necessarily want to specialize on x == 1 and y % 16, which I assume would fall out of the pairwise specializations

Not at the moment. vLLM actually only has one symbolic variable (https://www.anyscale.com/blog/continuous-batching-llm-inference) so we don't need to worry about that for our first customer. That being said, I'm happy to bikeshed what a better multi-var API may look like during composability.

eellison · 2025-05-12T16:32:52Z

test/inductor/test_torchinductor.py

+        dynamic_specialized = do_bench(
+            lambda: inductor_matmul(dynamic_specialized_a, b)
+        )


We should check the output code

eellison · 2025-05-12T16:35:51Z

.github/ci_commit_pins/xla.txt

@@ -1 +1 @@
-14256e6040d9e14698a877924456cdd92bfcd01d
+8eeef7f5b5363e9f35576184659226cc082311d6


intentional?

eellison · 2025-05-12T16:38:02Z

torch/_dynamo/output_graph.py

+                for specialization in old_fake_mode.shape_env.backend_specializations:
+                    source_index = sources.index(specialization.source)
+                    check_fn_source = inspect.getsource(specialization.check_fn).strip()
+                    check_fn = guards.LAMBDA_GUARD(  # type: ignore[attr-defined]
+                        specialization.check_fn,
+                        [check_fn_source],
+                    )
+
+                    log.debug(
+                        "Compiling backend specialized graph with specialization=%s",
+                        check_fn_source,
+                    )
+
+                    specialized_compiles.append(
+                        (
+                            functools.partial(
+                                lambda idx, args, check_fn=check_fn: check_fn(
+                                    args[idx]
+                                ),
+                                source_index,
+                            ),
+                            self.call_user_compiler(gm, specialization=specialization),


I think this makes sense. cc @anijain2305 for thoughts as well

etaf · 2025-05-13T00:51:37Z

test/inductor/test_torchinductor.py

+
+        m = 16
+        k = 1280
+        dynamic_a = torch.randn(m, k, device="cuda", dtype=torch.bfloat16)


Hi, The function is decorated with requires_gpu which means the case will run on GPUs like cuda/xpu, but the hard code cuda here will fail on other GPUs like XPU.

Suggested change

dynamic_a = torch.randn(m, k, device="cuda", dtype=torch.bfloat16)

dynamic_a = torch.randn(m, k, device=GPU_TYPE, dtype=torch.bfloat16)

etaf · 2025-05-13T00:51:59Z

test/inductor/test_torchinductor.py

+        m = 16
+        k = 1280
+        dynamic_a = torch.randn(m, k, device="cuda", dtype=torch.bfloat16)
+        dynamic_specialized_a = torch.randn(m, k, device="cuda", dtype=torch.bfloat16)


Suggested change

dynamic_specialized_a = torch.randn(m, k, device="cuda", dtype=torch.bfloat16)

dynamic_specialized_a = torch.randn(m, k, device=GPU_TYPE, dtype=torch.bfloat16)

etaf · 2025-05-13T00:52:15Z

test/inductor/test_torchinductor.py

+        k = 1280
+        dynamic_a = torch.randn(m, k, device="cuda", dtype=torch.bfloat16)
+        dynamic_specialized_a = torch.randn(m, k, device="cuda", dtype=torch.bfloat16)
+        b = torch.randn(k, m, device="cuda", dtype=torch.bfloat16)


Suggested change

b = torch.randn(k, m, device="cuda", dtype=torch.bfloat16)

b = torch.randn(k, m, device=GPU_TYPE, dtype=torch.bfloat16)

bobrenjc93 · 2025-05-13T23:48:43Z

Abandoning in favor of lazy approach: #153449

use backend specializations in compile_and_call_fx_graph

ade77f1

[ghstack-poisoned]

This was referenced May 1, 2025

[ez] fix grammar mistakes in StatefulSymbolicContext comment #152598

Closed

[not for review] benchmark script #152596

Closed

bobrenjc93 mentioned this pull request May 1, 2025

[multigraph] add backend_specialization kwarg to mark_dynamic #152597

Closed

pytorch-bot bot added ciflow/inductor module: dynamo labels May 1, 2025

bobrenjc93 mentioned this pull request May 1, 2025

store backend specializations in StatelessSymbolicContext #152600

Closed

bobrenjc93 added a commit that referenced this pull request May 1, 2025

use backend specializations in compile_and_call_fx_graph

9914ca3

ghstack-source-id: f0e905e Pull Request resolved: #152601

This was referenced May 2, 2025

thread through specialization to compile_fx #152650

Closed

add codegen layer specialization dispatch #152670

Closed

Update on "use backend specializations in compile_and_call_fx_graph"

c8e359b

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames [ghstack-poisoned]

pytorch-bot bot added the release notes: AO frontend label May 2, 2025

Update on "use backend specializations in compile_and_call_fx_graph"

87b796e

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames [ghstack-poisoned]

pytorch-bot bot added the module: inductor label May 2, 2025

Update on "use backend specializations in compile_and_call_fx_graph"

2fc599a

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov [ghstack-poisoned]

bobrenjc93 mentioned this pull request May 3, 2025

wip #152749

Closed

Update on "use backend specializations in compile_and_call_fx_graph"

7b49125

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov [ghstack-poisoned]

bobrenjc93 mentioned this pull request May 3, 2025

wip #152757

Closed

bobrenjc93 added 8 commits May 3, 2025 00:07

Update on "use backend specializations in compile_and_call_fx_graph"

2f887ce

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov [ghstack-poisoned]

Update on "use backend specializations in compile_and_call_fx_graph"

49e8ee5

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov [ghstack-poisoned]

Update on "use backend specializations in compile_and_call_fx_graph"

26915b7

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov [ghstack-poisoned]

Update on "use backend specializations in compile_and_call_fx_graph"

b9a6bc5

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov [ghstack-poisoned]

Update on "use backend specializations in compile_and_call_fx_graph"

10000

24b506c

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov [ghstack-poisoned]

Update on "use backend specializations in compile_and_call_fx_graph"

6a82fcc

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov [ghstack-poisoned]

Update on "use backend specializations in compile_and_call_fx_graph"

e8eb158

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov [ghstack-poisoned]

Update on "use backend specializations in compile_and_call_fx_graph"

c19b818

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov [ghstack-poisoned]

bobrenjc93 added a commit that referenced this pull request May 4, 2025

use backend specializations in compile_and_call_fx_graph

56b8935

ghstack-source-id: 866a87d Pull Request resolved: #152601

Update on "use backend specializations in compile_and_call_fx_graph"

fc279ed

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov [ghstack-poisoned]

bobrenjc93 added a commit that referenced this pull request May 4, 2025

use backend specializations in compile_and_call_fx_graph

ebfe399

ghstack-source-id: 0e54801 Pull Request resolved: #152601

bobrenjc93 added a commit that referenced this pull request May 5, 2025

use backend specializations in compile_and_call_fx_graph

7d7cf39

ghstack-source-id: 18b8a12 Pull Request resolved: #152601

bobrenjc93 added a commit that referenced this pull request May 5, 2025

use backend specializations in compile_and_call_fx_graph

a2e70ae

ghstack-source-id: 5c8cc69 Pull Request resolved: #152601

bobrenjc93 added a commit that referenced this pull request May 5, 2025

use backend specializations in compile_and_call_fx_graph

23ce332

ghstack-source-id: d6027ff Pull Request resolved: #152601

bobrenjc93 added a commit that referenced this pull request May 5, 2025

use backend specializations in compile_and_call_fx_graph

204cd59

ghstack-source-id: d1b3e0f Pull Request resolved: #152601

bobrenjc93 added a commit that referenced this pull request May 5, 2025

use backend specializations in compile_and_call_fx_graph

b6a61bf

ghstack-source-id: 93da8dc Pull Request resolved: #152601

bobrenjc93 requested review from zou3519 and eellison May 5, 2025 12:45

bobrenjc93 marked this pull request as ready for review May 5, 2025 12:46

bobrenjc93 requested review from laithsakka, avikchaudhuri, tugsbayasgalan, zhxchen17, ydwu4, angelayi, bdhirsh and a team as code owners May 5, 2025 12:46

youkaichao reviewed May 6, 2025

View reviewed changes

test/inductor/test_torchinductor.py Show resolved Hide resolved

zou3519 reviewed May 12, 2025

View reviewed changes

eellison reviewed May 12, 2025

View reviewed changes

bobrenjc93 requested a review from anijain2305 May 12, 2025 23:12

etaf requested changes May 13, 2025

View reviewed changes

bobrenjc93 closed this May 13, 2025

github-actions bot deleted the gh/bobrenjc93/331/head branch June 15, 2025 02:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[multigraph] use backend specializations in compile_and_call_fx_graph #152601

[multigraph] use backend specializations in compile_and_call_fx_graph #152601

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

		@@ -1 +1 @@
		14256e6040d9e14698a877924456cdd92bfcd01d
		8eeef7f5b5363e9f35576184659226cc082311d6

	dynamic_a = torch.randn(m, k, device="cuda", dtype=torch.bfloat16)
	dynamic_a = torch.randn(m, k, device=GPU_TYPE, dtype=torch.bfloat16)

	dynamic_specialized_a = torch.randn(m, k, device="cuda", dtype=torch.bfloat16)
	dynamic_specialized_a = torch.randn(m, k, device=GPU_TYPE, dtype=torch.bfloat16)

	b = torch.randn(k, m, device="cuda", dtype=torch.bfloat16)
	b = torch.randn(k, m, device=GPU_TYPE, dtype=torch.bfloat16)

[multigraph] use backend specializations in compile_and_call_fx_graph #152601

[multigraph] use backend specializations in compile_and_call_fx_graph #152601

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/152601

❗ 1 Active SEVs

❌ 2 New Failures

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!