[export] Refactor pt2 save/load #152495

angelayi · 2025-04-30T00:11:12Z

Refactor the pt2 archive saving to consolidate the format of torch.export.save and torch._inductor.package.package_aoti.

This PR adds the following functions, which torch.export.save and AOTI packaging calls into:

package_pt2(
    f: FileLike,
    *,
    exported_programs: Optional[Union[ExportedProgram, dict[str, ExportedProgram]]] = None,
    aoti_files: Optional[Union[list[str], dict[str, list[str]]]] = None,
    extra_files: Optional[dict[str, Any]] = None,
) -> FileLike

@dataclass
class PT2ArchiveContents:
    exported_programs: dict[str, ExportedProgram]
    aoti_runners: dict[str, AOTICompiledModel]
    extra_files: dict[str, Any]

load_pt2(f: FileLike) -> PT2ArchiveContents

Power users directly call into these APIs if they want to bundle multiple exported programs, aoti files, or extra metadata.

This is how the pt2 archive looks like (spec):

├── archive_format
├── version
├── .data
├── data
│   ├── aotinductor
│   │   └── model1
│   │       ├── model1.cpp
│   │       ├── model1.so  # currently AOTI automatically moves weights in here, TODO to move it out
│   │       ├── cg7domx3woam3nnliwud7yvtcencqctxkvvcafuriladwxw4nfiv.cubin
│   │       └── cubaaxppb6xmuqdm4bej55h2pftbce3bjyyvljxbtdfuolmv45ex.cubin
│   ├── weights
│   │  ├── model1.pt  # TODO to dedup weights between model1/model2
│   │  └── model2.pt
│   └── constants
│   │  ├── model1.pt  # TODO to dedup weights between model1/model2
│   │  └── model2.pt
│   └── sample_inputs
│      ├── model1.pt  # TODO to dedup weights between model1/model2
│      └── model2.pt
├── extra
│   └── user_metadata.txt
└── models
    ├── model1.json
    └── model2.json

Future todos:

unbundle the weights -- instead of .pt, we can use bin files, which will also allow us to dedup weights if we store multiple models
update aoti_compile_and_package to also save the exported program
integrate TNR with this packaging flow

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov

pytorch-bot · 2025-04-30T00:11:16Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/152495

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 7 Unrelated Failures

As of commit 98cdf80 with merge base ce317cd ():

NEW FAILURE - The following job has failed:

trunk / win-vs2022-cpu-py3 / test (default, 1, 3, ephemeral.windows.4xlarge.nonephemeral) (gh)
export\test_serialize.py::TestSaveLoad::test_save_file

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

pull / linux-focal-cuda12.6-py3.10-gcc11 / test (default, 1, 5, ephemeral.linux.4xlarge.nvidia.gpu) (gh) (trunk failure)
dynamo/test_logging.py::LoggingTests::test_inductor_debug
pull / linux-focal-cuda12.6-py3.10-gcc11-sm89 / test (default, 2, 5, ephemeral.linux.g6.4xlarge.experimental.nvidia.gpu) (gh) (trunk failure)
dynamo/test_logging.py::LoggingTests::test_inductor_debug
pull / linux-focal-py3.13-clang10 / test (default, 4, 5, ephemeral.linux.4xlarge) (gh) (trunk failure)
dynamo/test_logging.py::LoggingTests::test_inductor_debug
pull / linux-focal-py3.9-clang10 / test (default, 1, 5, ephemeral.linux.4xlarge) (gh) (trunk failure)
dynamo/test_logging.py::LoggingTests::test_inductor_debug
pull / linux-jammy-py3.10-clang15-asan / test (default, 5, 6, ephemeral.linux.4xlarge) (gh) (trunk failure)
dynamo/test_logging.py::LoggingTests::test_inductor_debug
pull / linux-jammy-py3.9-gcc11 / test (default, 4, 5, ephemeral.linux.2xlarge) (gh) (trunk failure)
dynamo/test_logging.py::LoggingTests::test_inductor_debug
trunk / macos-py3-arm64 / test (default, 2, 3, macos-m1-stable) (gh) (trunk failure)
dynamo/test_logging.py::LoggingTests::test_inductor_debug

This comment was automatically generated by Dr. CI and updates every 15 minutes.

facebook-github-bot · 2025-05-01T16:46:21Z

@angelayi has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

isaaccorley · 2025-05-06T17:56:40Z

@angelayi this is perfect! Can't wait to use this

desertfire

CI test failure is real.

desertfire · 2025-05-06T22:22:20Z

torch/_inductor/codecache.py

@@ -1876,7 +1875,7 @@ def _pad_to_alignment(raw_bytes: bytes) -> bytes:
                magic_number = 0
            else:
                magic_number = cast(
-                    int, torch.randint(0, torch.iinfo(torch.int64).max, (1,)).item()
+                    "int", torch.randint(0, torch.iinfo(torch.int64).max, (1,)).item()


why this change?

desertfire · 2025-05-06T22:29:20Z

torch/export/pt2_archive/_package.py

+    *,
+    expected_opset_version: Optional[dict[str, int]] = None,
+    run_single_threaded: bool = False,
+    num_runners: int = 1,


Need to pick up changes in #152093

angelayi requested review from desertfire and yushangdi April 30, 2025 00:11

pytorch-bot bot added ciflow/inductor 8000 module: inductor release notes: export labels Apr 30, 2025

angelayi force-pushed the angelayi/export_save branch 2 times, most recently from b90af71 to 75ee02b Compare May 1, 2025 00:24

[export] Refactor pt2 save/load

98cdf80

angelayi force-pushed the angelayi/export_save branch from 75ee02b to 98cdf80 Compare May 1, 2025 16:20

angelayi marked this pull request as ready for review May 1, 2025 16:46

angelayi requested review from avikchaudhuri, tugsbayasgalan, zhxchen17 and ydwu4 as code owners May 1, 2025 16:46

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label May 1, 2025

angelayi requested a review from SherlockNoMad May 1, 2025 22:25

rbavery mentioned this pull request May 6, 2025

initial export and load logic for PT2 archives stac-extensions/mlm#87

Draft

10 tasks

desertfire reviewed May 6, 2025

View reviewed changes

desertfire self-requested a review May 13, 2025 14:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[export] Refactor pt2 save/load #152495

[export] Refactor pt2 save/load #152495

[export] Refactor pt2 save/load #152495

Are you sure you want to change the base?

[export] Refactor pt2 save/load #152495

Conversation

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/152495

❌ 1 New Failure, 7 Unrelated Failures

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment