-
Notifications
You must be signed in to change notification settings - Fork 67
sync with top of pytorch tree #1810
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…143332)" This reverts commit a9c753b. Reverted #143332 on behalf of https://github.com/malfet due to Surprisingly failure is caused by this PR ([comment](#143332 (comment)))
ditto Pull Request resolved: #143685 Approved by: https://github.com/kit1980, https://github.com/seemethere, https://github.com/atalman
This PR adds support for export to unwrap/wrap subclasses AOT so that we can trace through subclass parameters. This will resolve the UX issue in torchao where users had to manually unwrap their subclasses before calling export. Differential Revision: [D67531057](https://our.internmc.facebook.com/intern/diff/D67531057) Pull Request resolved: #141941 Approved by: https://github.com/bdhirsh
tlparse PR: pytorch/tlparse#83 Pull Request resolved: #141907 Approved by: https://github.com/ezyang
clearing at dynamo start is an issue because it throws away events from compiled autograd Pull Request resolved: #143175 Approved by: https://github.com/Skylion007, https://github.com/jamesjwu ghstack dependencies: #141907
When we unflatten, the submodules we generate (`InterpreterModule` or `InterpreterModuleDispatcher`) are not related by type to the original submodules `N`. This makes `isinstance(mod, N)` checks fail. Since we do not have the original types after export, the best we can do is expose a `type_name()` method that carries the original type name, which we do carry in `nn_module_stack` entries. Differential Revision: [D67526542](https://our.internmc.facebook.com/intern/diff/D67526542/) Pull Request resolved: #143664 Approved by: https://github.com/tugsbayasgalan
Test Plan: Sandcastle Differential Revision: D67549758 Pull Request resolved: #143693 Approved by: https://github.com/huydhn
Pull Request resolved: #141748 Approved by: https://github.com/ezyang
Use set_feature_use for logging aot autograd cache so that dynamo_compile has this data as well as PT2 Compile Events. Differential Revision: [D67536293](https://our.internmc.facebook.com/intern/diff/D67536293/) Pull Request resolved: #143674 Approved by: https://github.com/bobrenjc93
Pull Request resolved: #143548 Approved by: https://github.com/yanboliang, https://github.com/jansel, https://github.com/williamwen42
Pull Request resolved: #143567 Approved by: https://github.com/williamwen42, https://github.com/jansel ghstack dependencies: #143548
Fix #143472 Pull Request resolved: #143491 Approved by: https://github.com/desertfire, https://github.com/jansel, https://github.com/EikanWang
…per in runtime. (#142322) This PR aims to removes the de pendency on Intel Compiler at Inductor runtime. Now we only need a SYCL_HOME in runtime to find the sycl headers and libs. Pull Request resolved: #142322 Approved by: https://github.com/EikanWang, https://github.com/desertfire, https://github.com/albanD ghstack dependencies: #143491
Summary: Emit a CMakeLists.txt with compile and link options when package_cpp_only is specified. After unzipping AOTI generated .pt2 package file, user can manually build the generated model code in their local environment. Pull Request resolved: #143680 Approved by: https://github.com/huydhn
This reverts commit c7d9f29. Reverted #143402 on behalf of https://github.com/huydhn due to The internal diff D67148738 has been reverted ([comment](#143402 (comment)))
…k memory usage (#143347)" This reverts commit efe21ee. Reverted #143347 on behalf of https://github.com/huydhn due to D67118173 has been backed out internally ([comment](#143347 (comment)))
https://manifold.edge.x2p.facebook.net/v0/read/tree/logs/.tmprli4iy/index.html?bucketName=tlparse_reports&apiKey=tlparse_reports-key&withPayload=1&timeoutMsec=100 ``` [ { "args": { "compile_id": "0/-/-", "graph_id": 0 }, "cat": "dynamo_timed", "name": "compiled_autograd", "ph": "B", "pid": 0, "tid": 0, "ts": 1733886868992655.8 }, { "args": { "compile_id": "0/-/-", "graph_id": 0 }, "cat": "dynamo_timed", "name": "compiled_autograd", "ph": "E", "pid": 0, "tid": 0, "ts": 1733886869130681.0 }, { "args": { "compile_id": "0/0/0" }, "cat": "dynamo_timed", "name": "dynamo", "ph": "B", "pid": 0, "tid": 0, "ts": 1733886869134350.5 }, { ``` Pull Request resolved: #140964 Approved by: https://github.com/masnesral ghstack dependencies: #141907, #143175
…143693)" This reverts commit ae3d385. Reverted #143693 on behalf of https://github.com/huydhn due to Sorry for reverting this change but it has a conflict with #143639 that is breaking trunk ([comment](#143693 (comment)))
This reverts commit 23ca7c2. Reverted #143639 on behalf of https://github.com/huydhn due to This is failing OSS tests ([comment](#143639 (comment)))
Reuse partial reductions for complete reductions. We could expand this to more cover more types of reductions, although we'd have to be a bit more careful about keeping the intermediary, partial reduction in higher precision. Just doing the ops which do not depend on a higher compute_dtype_precision for now to cover the relevant use case initially. Fix for #136267. Longer term, we should make sure cooperative reductions fuse partial and complete reductions. Pull Request resolved: #143600 Approved by: https://github.com/vkuzo
Summary: LLVM-15 has a warning `-Wunused-variable` which we treat as an error because it's so often diagnostic of a code issue. Unused variables can compromise readability or, worse, performance. This diff either (a) removes an unused variable and, possibly, it's associated code or (b) qualifies the variable with `[[maybe_unused]]`. - If you approve of this diff, please use the "Accept & Ship" button :-) Test Plan: Sandcastle Pull Request resolved: #143639 Approved by: https://github.com/kit1980, https://github.com/malfet, https://github.com/cyyever
Test Plan: Sandcastle Pull Request resolved: #143693 Approved by: https://github.com/huydhn
This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/main/.github/workflows/nightly.yml). Update the pinned audio hash. Pull Request resolved: #143694 Approved by: https://github.com/pytorchbot
…ting a trace like generated kernels and index tensor data (#143430)" This reverts commit 33dd4f1. Reverted #143430 on behalf of https://github.com/huydhn due to The internal diff D58707846 has been backed out ([comment](#143430 (comment)))
# Summary: Full Context: https://docs.google.com/document/d/1-j5KSbfGFJQcH4sYh7BIeJXso3zYzl5G5yFQqXdKx_o/edit?usp=sharing tl;dr This change introduces classes which help determine a dynamic memory budget. This will mostly be helpful for models with many implicit graph breaks. --- New Classes: *GraphInfoProvider* * Takes the joint_graph as well as the input memories and runtimes and parses the graph + values into usable forms for the SolverEvaluator. *KnapsackEvaluator* * Provides a function: Given all of the four inputs (solver function as a callable, max_dynamic_memory_budget, min_dynamic_memory_budget, dynamic_memory_budget_pareto_granularity) it returns an approximation of the knee point of the pareto distribution. # Test Plan: ### LintRunner LintRunner Output: P1700445547 ### Unit Tests ``` $ buck test @mode/opt //caffe2/test/functorch:test_ac_knapsack `@mode/opt` was specified, but not found. Using file at `//mode/opt`. This behavior is being deprecated. Please use `"@//mode/opt"` instead File changed: fbcode//caffe2/.ruff_cache/0.7.4/.tmpB6PmDS File changed: fbsource//xplat/caffe2/test/functorch/test_ac_knapsack.py File changed: fbcode//caffe2/.ruff_cache/0.7.4/.tmpyjCiPn 20 additional file change events Buck UI: https://www.internalfb.com/buck2/414ead46-9ede-4192-8e1a-5d3c52bdb9cc Test UI: https://www.internalfb.com/intern/testinfra/testrun/6473924710342830 Network: Up: 0B Down: 0B (reSessionID-159794b9-9d61-477e-8e63-9bdeaa537dca) Analyzing targets. Remaining 0/214 Executing actions. Remaining 0/6933 0.1s exec time total Command: test. Finished 1 local Time elapsed: 18.5s Tests finished: Pass 15. Fail 0. Fatal 0. Skip 0. Build failure 0 ``` ### Test Run Updated the config: ``` activation_memory_budget_solver: DYNAMIC_MEMORY_BUDGET_DP ``` Confirming proper execution via: [aps-fb_fm_v4_768_01_dynamic-2a792ba8af](https://www.internalfb.com/mlhub/pipelines/runs/mast/aps-fb_fm_v4_768_01_dynamic-2a792ba8af?job_attempt=0&version=0&env=PRODUCTION) Pull Request resolved: #143539 Approved by: https://github.com/jansel
Fixes #ISSUE_NUMBER Pull Request resolved: #141787 Approved by: https://github.com/albanD
Retracing while preserving module call signatures used to be a problem because graph modules don't have submodules at given paths. This led to a number of failing retracebility tests. By not trying to wrap modules with export tracepoints we can pass most of these tests; the only exception is where you do module swapping on retraced programs, which is still not possible. Differential Revision: [D67539304](https://our.internmc.facebook.com/intern/diff/D67539304/) Pull Request resolved: #143676 Approved by: https://github.com/zhxchen17, https://github.com/tugsbayasgalan ghstack dependencies: #143664
This reverts commit 6733045. Reverted #140030 on behalf of https://github.com/huydhn due to Sorry for reverting your change, but my first attempt to fix internal build does not fix all the cases, so let us try again ([comment](#140030 (comment)))
Fixes #ISSUE_NUMBER Pull Request resolved: #143355 Approved by: https://github.com/albanD
Summary: If module being quantized contains a some meta tensors and some tensors with actual device, we should not fail quantization. Quantization should also not fail if new quantized module is created on a meta device. Differential Revision: D66895899 Pull Request resolved: #142262 Approved by: https://github.com/iamzainhuda
Fix #143967 Pull Request resolved: #143970 Approved by: https://github.com/EikanWang, https://github.com/jansel
Fixes #136862 1. removed dead code from torch/_dynamo/convert_frame.py 2. ran `lintrunner -a` and all the tests passed. 3. ran the unit tests and everything seems to be in order. Pull Request resolved: #140938 Approved by: https://github.com/zou3519
Pull Request resolved: #142347 Approved by: https://github.com/gujinghui, https://github.com/albanD
# Motivation Due to the potential for the external SYCL queue to have a low priority, we need to support the low-priority SYCL queue for native XPU Streams to maintain consistency. Pull Request resolved: #141119 Approved by: https://github.com/gujinghui, https://github.com/albanD ghstack dependencies: #142347
# Motivation This PR aims to introduce `torch.xpu.ExternalStream` to be used to wrap SYCL queue created in other libraries to PyTorch. # Additional Context Pull Request resolved: #141123 Approved by: https://github.com/albanD, https://github.com/EikanWang ghstack dependencies: #142347, #141119
Pull Request resolved: #143799 Approved by: https://github.com/albanD, https://github.com/EikanWang ghstack dependencies: #142347, #141119, #141123
# Motivation As mentioned in #141119 (comment), we properly handle the priority value if it is outside of the priority range. # Additional Context If the value falls outside of the allowed priority range, it will automatically be mapped to the nearest valid priority(either lowest or highest). Pull Request resolved: #143849 Approved by: https://github.com/albanD, https://github.com/EikanWang ghstack dependencies: #142347, #141119, #141123, #143799
By calling `metal::min` and `metal::max` respectively with argument typecast to a common type to avoid ambiguous calls errors TODO: Implement NaN propagation for both eager and compile, see #143976 `pytest test/inductor/test_torchinductor.py -k _mps` score is 460 failed, 291 passed, 32 skipped Pull Request resolved: #143977 Approved by: https://github.com/jansel ghstack dependencies: #143948, #143949, #143973
At the moment by generating multiple MetalLibraries `pytest test/inductor/test_torchinductor.py -k _mps` score is 434 failed, 317 passed, 32 skipped Pull Request resolved: #143998 Approved by: https://github.com/jansel, https://github.com/ruidazeng ghstack dependencies: #143948, #143949, #143973, #143977
This reverts commit 135a2d4. Reverted #142350 on behalf of https://github.com/jeanschmidt due to breaking internal signals ([comment](#142350 (comment)))
…de in (#143975)" This reverts commit 7c1c073. Reverted #143975 on behalf of https://github.com/jeanschmidt due to Need to revert in order to be able to revert #139321 feel free to merge it back once conflicts are cleared ([comment](#143975 (comment)))
This reverts commit 9e8d84f. Reverted #139321 on behalf of https://github.com/jeanschmidt due to breaking internal signals ([comment](#139321 (comment)))
See #144006 ```py __________________________________________ CudaReproTests.test_repeated_masked_load __________________________________________ RuntimeError: First class dim doesn't work with python 3.12 The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/home/jansel/conda/envs/pytorch/lib/python3.12/unittest/case.py", line 58, in testPartExecutor yield File "/home/jansel/conda/envs/pytorch/lib/python3.12/unittest/case.py", line 634, in run self._callTestMethod(testMethod) File "/home/jansel/conda/envs/pytorch/lib/python3.12/unittest/case.py", line 589, in _callTestMethod if method() is not None: ^^^^^^^^ File "/home/jansel/pytorch/torch/testing/_internal/common_utils.py", line 3108, in wrapper method(*args, **kwargs) File "/home/jansel/pytorch/test/inductor/test_cuda_repro.py", line 1678, in test_repeated_masked_load from functorch.einops import rearrange File "/home/jansel/pytorch/functorch/einops/__init__.py", line 1, in <module> from .rearrange import rearrange File "/home/jansel/pytorch/functorch/einops/rearrange.py", line 7, in <module> from functorch._C import dim as _C ImportError: initialization failed ``` Pull Request resolved: #144006 Approved by: https://github.com/Skylion007
Fixes #141426 Please see details in the issue. Pull Request resolved: #141427 Approved by: https://github.com/jansel
Pull Request resolved: #143926 Approved by: https://github.com/jansel
As titled, this PR add a kwarg src_data_rank to the distribute_tensor API, to allow user specify a specific rank as the full tensor source data. Previously we by default specify group_rank=0 as the source of truth for single device semantic, this new option: * gives advanced user flexiblity to choose the source data rank * allow user to specify None explicity, which means we will skip the communications needed (scatter/broadcast) for the cases that does not care about single device semantic (i.e. loading from a checkpoint) Pull Request resolved: #143883 Approved by: https://github.com/XilunWu, https://github.com/tianyu-l
as titled, this PR propagates the src_data_rank in the TP API, so that module level APIs could leverage the flexibility to choose src_data_rank, and avoid the communication if it does not need to Pull Request resolved: #144005 Approved by: https://github.com/tianyu-l ghstack dependencies: #143883
Followup after #143934, this check is no longer necessary and fixes a subset of inductor tests Before `pytest test/inductor/test_torchinductor.py -k _mps` reports 463 failed, 291 passed, 32 skipped after 456 failed, 298 passed, 32 skipped Pull Request resolved: #144055 Approved by: https://github.com/Skylion007
Fixes #143146 Pull Request resolved: #144030 Approved by: https://github.com/malfet
Change the label to make sure the jobs land on a node which has >= 4 GPUs. Pull Request resolved: #140319 Approved by: https://github.com/jeffdaily, https://github.com/malfet, https://github.com/kwen2501
Jenkins build for 8f3eb843730f38d7307228485b1accc69c4aa0f0 commit finished as FAILURE |
…3944) Pull Request resolved: #143944 Approved by: https://github.com/aorenste ghstack dependencies: #143943
Jenkins build for 8506a2af9aced8f084a27dbf73811a947a47d3f7 commit finished as FAILURE |
# Summary: This also makes updates to different repositories throughout FB code to roll any updates needed for this new release. I was not able to get AsyncMM.cu to build (still trying) Yfiu suggested that I just skip it for now Test Plan: Have run various build commands to try and expose errors Pull Request resolved: #143515 Approved by: https://github.com/eqy, https://github.com/Skylion007
Jenkins build for a8c98ce175e20c071a209e7aa69f9f28897cda8b commit finished as FAILURE |
Add networkx as a dependency for test_bazel Example failure: https://github.com/pytorch/pytorch/actions/runs/12551752021/job/34996706301 ``` INFO: From Testing //:test_bazel: ==================== Test output for //:test_bazel: Traceback (most recent call last): File "/var/lib/jenkins/.cache/bazel/_bazel_jenkins/fdf6d09bf4b4f04a71e2a7dfceb40620/sandbox/processwrapper-sandbox/6504/execroot/pytorch/bazel-out/k8-fastbuild/bin/test_bazel.runfiles/pytorch/test/_test_bazel.py", line 33, in <module> test_simple_compile_eager() File "/var/lib/jenkins/.cache/bazel/_bazel_jenkins/fdf6d09bf4b4f04a71e2a7dfceb40620/sandbox/processwrapper-sandbox/6504/execroot/pytorch/bazel-out/k8-fastbuild/bin/test_bazel.runfiles/pytorch/test/_test_bazel.py", line 27, in test_simple_compile_eager opt_foo1 = torch.compile(foo, backend="eager") File "/var/lib/jenkins/.cache/bazel/_bazel_jenkins/fdf6d09bf4b4f04a71e2a7dfceb40620/sandbox/processwrapper-sandbox/6504/execroot/pytorch/bazel-out/k8-fastbuild/bin/test_bazel.runfiles/pytorch/torch/__init__.py", line 2533, in compile backend = _TorchCompileWrapper(backend, mode, options, dynamic) File "/var/lib/jenkins/.cache/bazel/_bazel_jenkins/fdf6d09bf4b4f04a71e2a7dfceb40620/sandbox/processwrapper-sandbox/6504/execroot/pytorch/bazel-out/k8-fastbuild/bin/test_bazel.runfiles/pytorch/torch/__init__.py", line 2342, in __init__ self.compiler_fn = lookup_backend(backend) File "/var/lib/jenkins/.cache/bazel/_bazel_jenkins/fdf6d09bf4b4f04a71e2a7dfceb40620/sandbox/processwrapper-sandbox/6504/execroot/pytorch/bazel-out/k8-fastbuild/bin/test_bazel.runfiles/pytorch/torch/_dynamo/backends/registry.py", line 66, in lookup_backend _lazy_import() File "/var/lib/jenkins/.cache/bazel/_bazel_jenkins/fdf6d09bf4b4f04a71e2a7dfceb40620/sandbox/processwrapper-sandbox/6504/execroot/pytorch/bazel-out/k8-fastbuild/bin/test_bazel.runfiles/pytorch/torch/_dynamo/backends/registry.py", line 102, in _lazy_import import_submodule(backends) File "/var/lib/jenkins/.cache/bazel/_bazel_jenkins/fdf6d09bf4b4f04a71e2a7dfceb40620/sandbox/processwrapper-sandbox/6504/execroot/pytorch/bazel-out/k8-fastbuild/bin/test_bazel.runfiles/pytorch/torch/_dynamo/utils.py", line 2797, in import_submodule importlib.import_module(f"{mod.__name__}.{filename[:-3]}") File "/var/lib/jenkins/.cache/bazel/_bazel_jenkins/fdf6d09bf4b4f04a71e2a7dfceb40620/execroot/pytorch/external/python3_10_x86_64-unknown-linux-gnu/lib/python3.10/importlib/__init__.py", line 126, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "<frozen importlib._bootstrap>", line 1050, in _gcd_import File "<frozen importlib._bootstrap>", line 1027, in _find_and_load File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked File "<frozen importlib._bootstrap>", line 688, in _load_unlocked File "<frozen importlib._bootstrap_external>", line 883, in exec_module File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed File "/var/lib/jenkins/.cache/bazel/_bazel_jenkins/fdf6d09bf4b4f04a71e2a7dfceb40620/sandbox/processwrapper-sandbox/6504/execroot/pytorch/bazel-out/k8-fastbuild/bin/test_bazel.runfiles/pytorch/torch/_dynamo/backends/common.py", line 12, in <module> from torch._functorch.aot_autograd import ( File "/var/lib/jenkins/.cache/bazel/_bazel_jenkins/fdf6d09bf4b4f04a71e2a7dfceb40620/sandbox/processwrapper-sandbox/6504/execroot/pytorch/bazel-out/k8-fastbuild/bin/test_bazel.runfiles/pytorch/torch/_functorch/aot_autograd.py", line 147, in <module> from .partitioners import default_partition File "/var/lib/jenkins/.cache/bazel/_bazel_jenkins/fdf6d09bf4b4f04a71e2a7dfceb40620/sandbox/processwrapper-sandbox/6504/execroot/pytorch/bazel-out/k8-fastbuild/bin/test_bazel.runfiles/pytorch/torch/_functorch/partitioners.py", line 31, in <module> from ._activation_checkpointing.graph_info_provider import GraphInfoProvider File "/var/lib/jenkins/.cache/bazel/_bazel_jenkins/fdf6d09bf4b4f04a71e2a7dfceb40620/sandbox/processwrapper-sandbox/6504/execroot/pytorch/bazel-out/k8-fastbuild/bin/test_bazel.runfiles/pytorch/torch/_functorch/_activation_checkpointing/graph_info_provider.py", line 3, in <module> import networkx as nx ModuleNotFoundError: No module named 'networkx' ``` No periodic runs on this PR or its main branch commit, but I'm pretty sure its started on https://togithub.com/pytorch/pytorch/pull/143539 Pull Request resolved: #143995 Approved by: https://github.com/huydhn
Jenkins build for bb5e439f2d8a46172b8b7d2fdb7609822b9a97b1 commit finished as FAILURE |
I wonder if it's related with the build timeout for the PRs based on this branch, e.g. pytorch#143695