[WIP] suggest whitelist for dynamic shape recompilations #153442

pianpwk · 2025-05-13T01:26:08Z

More processing of recompilation reasons, to detect tensor sources we've recompiled dynamically, and suggest the dynamic whitelist to reduce recompilations.

Refactors GuardDebugInfo to hold both verbose_code_parts and failure_reasons fields, the former containing eval-able code, the latter reasons in plain english; previously they were combined in the same field and detecting what to eval/pattern-match was difficult.

For this toy example:

class Foo(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.lin = torch.nn.Linear(4, 4)
        self.attr = torch.randn(4)
        self.mode = "a"
    def forward(self, x, y):
        if self.mode == "a":
            return self.lin(x * 2) + self.attr + y
        else:
            return self.lin(x / 2) - self.attr - y

# 1
m = Foo()
fn = torch.compile(m)
fn(torch.randn(7, 4), torch.randn(4))

# 2
fn.mode = "b"
fn(torch.randn(7, 4), torch.randn(4))

# 3
fn.lin = torch.nn.Linear(8, 5)
fn.attr = torch.randn(5)
fn.mode = "b"
fn(torch.randn(9, 8), torch.randn(5))

logs with TORCH_LOGS="recompiles":
(first recompile)

V0512 19:49:21.987000 2692736 torch/_dynamo/guards.py:3451] [0/1] [__recompiles] Recompiling function forward in /data/users/pianpwk/pytorch/custom_tests/test_recompiles_tlparse.py:14
V0512 19:49:21.987000 2692736 torch/_dynamo/guards.py:3451] [0/1] [__recompiles]     triggered by the following guard failure(s):
V0512 19:49:21.987000 2692736 torch/_dynamo/guards.py:3451] [0/1] [__recompiles]     - 0/0: self.mode == 'a'

(second recompile)

V0512 19:49:22.098000 2692736 torch/_dynamo/guards.py:3451] [0/2] [__recompiles] Recompiling function forward in /data/users/pianpwk/pytorch/custom_tests/test_recompiles_tlparse.py:14
V0512 19:49:22.098000 2692736 torch/_dynamo/guards.py:3451] [0/2] [__recompiles]     triggered by the following guard failure(s):
V0512 19:49:22.098000 2692736 torch/_dynamo/guards.py:3451] [0/2] [__recompiles]     - 0/1: tensor 'x' size mismatch at index 0. expected 7, actual 9
V0512 19:49:22.098000 2692736 torch/_dynamo/guards.py:3451] [0/2] [__recompiles]     - 0/0: self.mode == 'a'                                       
V0512 19:49:22.098000 2692736 torch/_dynamo/guards.py:3451] [0/2] [__recompiles] 
V0512 19:49:22.098000 2692736 torch/_dynamo/guards.py:3451] [0/2] [__recompiles]     Multiple size mismatches found. The following environment variable would enable dynamic compilation to start, avoiding this recompile: TORCH_COMPILE_DYNAMIC_SOURCES="L['x'],L['y'],L['self'].attr,L['self']._modules['lin']._parameters['bias'],L['self']._modules['lin']._parameters['weight']"
V0512 19:49:22.098000 2692736 torch/_dynamo/guards.py:3451] [0/2] [__recompiles]     Size guard failed on a parameter, consider using torch._dynamo.config.force_parameter_static_shapes = False to allow dynamism on parameters.

logs with TORCH_LOGS="recompiles_verbose":
(first recompile)

V0512 19:51:39.792000 2709017 torch/_dynamo/guards.py:3449] [0/1] [__recompiles_verbose] Recompiling function forward in /data/users/pianpwk/pytorch/custom_tests/test_recompiles_tlparse.py:14
V0512 19:51:39.792000 2709017 torch/_dynamo/guards.py:3449] [0/1] [__recompiles_verbose]     triggered by the following guard failure(s):
V0512 19:51:39.792000 2709017 torch/_dynamo/guards.py:3449] [0/1] [__recompiles_verbose]     guard 0 failures:
V0512 19:51:39.792000 2709017 torch/_dynamo/guards.py:3449] [0/1] [__recompiles_verbose]     - 0/0: self.mode == 'a'

(second recompile)

V0512 19:51:39.865000 2709017 torch/_dynamo/guards.py:3449] [0/2] [__recompiles_verbose] Recompiling function forward in /data/users/pianpwk/pytorch/custom_tests/test_recompiles_tlparse.py:14
V0512 19:51:39.865000 2709017 torch/_dynamo/guards.py:3449] [0/2] [__recompiles_verbose]     triggered by the following guard failure(s):
V0512 19:51:39.865000 2709017 torch/_dynamo/guards.py:3449] [0/2] [__recompiles_verbose]     guard 0 failures:
V0512 19:51:39.865000 2709017 torch/_dynamo/guards.py:3449] [0/2] [__recompiles_verbose]     - 0/1: tensor 'x' size mismatch at index 0. expected 7, actual 9; tensor 'y' size mismatch at index 0. expected 4, actual 5; tensor 'self.attr' size mismatch at index 0. expected 4, actual 5; tensor 'self._modules['lin']._parameters['bias']' size mismatch at index 0. expected 4, actual 5; tensor 'self._modules['lin']._parameters['weight']' size mismatch at index 0. expected 4, actual 5
V0512 19:51:39.865000 2709017 torch/_dynamo/guards.py:3449] [0/2] [__recompiles_verbose] 
V0512 19:51:39.865000 2709017 torch/_dynamo/guards.py:3449] [0/2] [__recompiles_verbose]     guard 1 failures:
V0512 19:51:39.865000 2709017 torch/_dynamo/guards.py:3449] [0/2] [__recompiles_verbose]     - 0/0: self.mode == 'a'                                       ; tensor 'self.attr' size mismatch at index 0. expected 4, actual 5; tensor 'self._modules['lin']._parameters['bias']' size mismatch at index 0. expected 4, actual 5; tensor 'self._modules['lin']._parameters['weight']' size mismatch at index 0. expected 4, actual 5; tensor 'x' size mismatch at index 0. expected 7, actual 9; tensor 'y' size mismatch at index 0. expected 4, actual 5
V0512 19:51:39.865000 2709017 torch/_dynamo/guards.py:3449] [0/2] [__recompiles_verbose] 
V0512 19:51:39.865000 2709017 torch/_dynamo/guards.py:3449] [0/2] [__recompiles_verbose]     Multiple size mismatches found. The following environment variable would enable dynamic compilation to start, avoiding this recompile: TORCH_COMPILE_DYNAMIC_SOURCES="L['x'],L['y'],L['self'].attr,L['self']._modules['lin']._parameters['bias'],L['self']._modules['lin']._parameters['weight']"
V0512 19:51:39.865000 2709017 torch/_dynamo/guards.py:3449] [0/2] [__recompiles_verbose]     Size guard failed on a parameter, consider using torch._dynamo.config.force_parameter_static_shapes = False to allow dynamism on parameters.

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @chenyang78 @kadeng @chauhang @amjames

pytorch-bot · 2025-05-13T01:26:11Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/153442

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

CUDA not found in NVIDIA runners

❌ 6 New Failures

As of commit a11c15e with merge base 3aa8477 ():

NEW FAILURES - The following jobs have failed:

Lint / lintrunner-clang / linux-job (gh)
>>> Lint for torch/csrc/dynamo/guards.cpp:
Lint / lintrunner-noclang / linux-job (gh)
>>> Lint for torch/_dynamo/guards.py:
pull / linux-focal-cuda12.6-py3.10-gcc11-sm89 / test (default, 3, 5, lf.linux.g6.4xlarge.experimental.nvidia.gpu) (gh)
test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_flex_attention_noncontig_with_holes_False_cross_attention_False_cuda_float32
pull / linux-focal-py3.13-clang10 / test (dynamo_wrapped, 3, 3, lf.linux.2xlarge) (gh)
test_indexing.py::TestIndexingCPU::test_take_along_dim_cpu_float32
pull / linux-focal-py3.9-clang10 / test (dynamo_wrapped, 2, 3, lf.linux.2xlarge) (gh)
test_view_ops.py::TestViewOpsCPU::test_flatten_view_cpu
pull / linux-focal-py3.9-clang10 / test (dynamo_wrapped, 3, 3, lf.linux.2xlarge) (gh)
test_indexing.py::TestIndexingCPU::test_take_along_dim_cpu_float32

This comment was automatically generated by Dr. CI and updates every 15 minutes.

pianpwk · 2025-05-13T22:44:51Z

torch/_dynamo/guards.py

+            "Replicate": Replicate,
+            "Partial": Partial,
+            "DeviceMesh": DeviceMesh,
+        })


eval was failing due to local imports here:

pytorch/torch/_dynamo/guards.py

Lines 1675 to 1688 in 81719eb

if torch.distributed.is_available():

from torch.distributed.device_mesh import DeviceMesh

from torch.distributed.tensor.placement_types import (

Partial,

Replicate,

Shard,

)

ok_types = ok_types + (

Shard,

Replicate,

Partial,

DeviceMesh,

)

pianpwk · 2025-05-13T22:45:45Z

torch/_dynamo/guards.py

+            code_part = f"{ref}.__tensor_flatten__()[1] == {original_metadata}"
+            self.get_guard_manager(guard).add_lambda_guard(
+                metadata_checker, get_verbose_code_parts(code_part, guard)
+            )


__check_metadata wasn't registered as callable in closure_vars, so wasn't eval-able

…k/verbose_tensor_guards_v3

initw

4a0b832

pytorch-bot bot added ciflow/inductor module: dynamo labels May 13, 2025

pianpwk changed the title ~~initw~~ [WIP] suggest whitelist for dynamic shape recompilations May 13, 2025

pianpwk added 3 commits May 12, 2025 19:52

refactor

2aa406f

lint

b7b7dcd

temp

9349cf1

pianpwk commented May 13, 2025

View reviewed changes

pianpwk added the release notes: dynamo label May 14, 2025

pianpwk added 6 commits May 14, 2025 13:41

fix test

439dbc1

Merge branch 'main' of https://github.com/pytorch/pytorch into pianpw…

b5935c3

…k/verbose_tensor_guards_v3

test

e758208

one test

bf7b37b

guards

b865525

one test

a11c15e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] suggest whitelist for dynamic shape recompilations #153442

[WIP] suggest whitelist for dynamic shape recompilations #153442

	if torch.distributed.is_available():
	from torch.distributed.device_mesh import DeviceMesh
	from torch.distributed.tensor.placement_types import (
	Partial,
	Replicate,
	Shard,
	)

	ok_types = ok_types + (
	Shard,
	Replicate,
	Partial,
	DeviceMesh,
	)

[WIP] suggest whitelist for dynamic shape recompilations #153442

Are you sure you want to change the base?

[WIP] suggest whitelist for dynamic shape recompilations #153442

Conversation

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/153442

❗ 1 Active SEVs

❌ 6 New Failures

Choose a reason for hiding this comment

Choose a reason for hiding this comment