8000 sync with top of pytorch tree by xw285cornell · Pull Request #1810 · ROCm/pytorch · GitHub
[go: up one dir, main page]

Skip to content

sync with top of pytorch tree #1810

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 219 commits into from
Jan 2, 2025
Merged

sync with top of pytorch tree #1810

merged 219 commits into from
Jan 2, 2025

Conversation

xw285cornell
Copy link

I wonder if it's related with the build timeout for the PRs based on this branch, e.g. pytorch#143695

pytorchmergebot and others added 30 commits December 21, 2024 00:06
…143332)"

This reverts commit a9c753b.

Reverted #143332 on behalf of https://github.com/malfet due to Surprisingly failure is caused by this PR ([comment](#143332 (comment)))
This PR adds support for export to unwrap/wrap subclasses AOT so that we can trace through subclass parameters. This will resolve the UX issue in torchao where users had to manually unwrap their subclasses before calling export.

Differential Revision: [D67531057](https://our.internmc.facebook.com/intern/diff/D67531057)
Pull Request resolved: #141941
Approved by: https://github.com/bdhirsh
clearing at dynamo start is an issue because it throws away events from compiled autograd

Pull Request resolved: #143175
Approved by: https://github.com/Skylion007, https://github.com/jamesjwu
ghstack dependencies: #141907
When we unflatten, the submodules we generate (`InterpreterModule` or `InterpreterModuleDispatcher`) are not related by type to the original submodules `N`. This makes `isinstance(mod, N)` checks fail. Since we do not have the original types after export, the best we can do is expose a `type_name()` method that carries the original type name, which we do carry in `nn_module_stack` entries.

Differential Revision: [D67526542](https://our.internmc.facebook.com/intern/diff/D67526542/)

Pull Request resolved: #143664
Approved by: https://github.com/tugsbayasgalan
Test Plan: Sandcastle

Differential Revision: D67549758

Pull Request resolved: #143693
Approved by: https://github.com/huydhn
Use set_feature_use for logging aot autograd cache so that dynamo_compile has this data as well as PT2 Compile Events.

Differential Revision: [D67536293](https://our.internmc.facebook.com/intern/diff/D67536293/)
Pull Request resolved: #143674
Approved by: https://github.com/bobrenjc93
…per in runtime. (#142322)

This PR aims to removes the de pendency on Intel Compiler at Inductor runtime. Now we only need a SYCL_HOME in runtime to find the sycl headers and libs.

Pull Request resolved: #142322
Approved by: https://github.com/EikanWang, https://github.com/desertfire, https://github.com/albanD
ghstack dependencies: #143491
Summary: Emit a CMakeLists.txt with compile and link options when package_cpp_only is specified. After unzipping AOTI generated .pt2 package file, user can manually build the generated model code in their local environment.

Pull Request resolved: #143680
Approved by: https://github.com/huydhn
This reverts commit c7d9f29.

Reverted #143402 on behalf of https://github.com/huydhn due to The internal diff D67148738 has been reverted ([comment](#143402 (comment)))
…k memory usage (#143347)"

This reverts commit efe21ee.

Reverted #143347 on behalf of https://github.com/huydhn due to D67118173 has been backed out internally ([comment](#143347 (comment)))
https://manifold.edge.x2p.facebook.net/v0/read/tree/logs/.tmprli4iy/index.html?bucketName=tlparse_reports&apiKey=tlparse_reports-key&withPayload=1&timeoutMsec=100
```
[
  {
    "args": {
      "compile_id": "0/-/-",
      "graph_id": 0
    },
    "cat": "dynamo_timed",
    "name": "compiled_autograd",
    "ph": "B",
    "pid": 0,
    "tid": 0,
    "ts": 1733886868992655.8
  },
  {
    "args": {
      "compile_id": "0/-/-",
      "graph_id": 0
    },
    "cat": "dynamo_timed",
    "name": "compiled_autograd",
    "ph": "E",
    "pid": 0,
    "tid": 0,
    "ts": 1733886869130681.0
  },
  {
    "args": {
      "compile_id": "0/0/0"
    },
    "cat": "dynamo_timed",
    "name": "dynamo",
    "ph": "B",
    "pid": 0,
    "tid": 0,
    "ts": 1733886869134350.5
  },
  {
```

Pull Request resolved: #140964
Approved by: https://github.com/masnesral
ghstack dependencies: #141907, #143175
…143693)"

This reverts commit ae3d385.

Reverted #143693 on behalf of https://github.com/huydhn due to Sorry for reverting this change but it has a conflict with #143639 that is breaking trunk ([comment](#143693 (comment)))
This reverts commit 23ca7c2.

Reverted #143639 on behalf of https://github.com/huydhn due to This is failing OSS tests ([comment](#143639 (comment)))
Reuse partial reductions for complete reductions. We could expand this to more cover more types of reductions, although we'd have to be a bit more careful about keeping the intermediary, partial reduction in higher precision.

Just doing the ops which do not depend on a higher compute_dtype_precision for now to cover the relevant use case initially.

Fix for #136267. Longer term, we should make sure cooperative reductions fuse partial and complete reductions.

Pull Request resolved: #143600
Approved by: https://github.com/vkuzo
Summary:
LLVM-15 has a warning `-Wunused-variable` which we treat as an error because it's so often diagnostic of a code issue. Unused variables can compromise readability or, worse, performance.

This diff either (a) removes an unused variable and, possibly, it's associated code or (b) qualifies the variable with `[[maybe_unused]]`.

 - If you approve of this diff, please use the "Accept & Ship" button :-)

Test Plan: Sandcastle

Pull Request resolved: #143639
Approved by: https://github.com/kit1980, https://github.com/malfet, https://github.com/cyyever
This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/main/.github/workflows/nightly.yml).
Update the pinned audio hash.
Pull Request resolved: #143694
Approved by: https://github.com/pytorchbot
…ting a trace like generated kernels and index tensor data (#143430)"

This reverts commit 33dd4f1.

Reverted #143430 on behalf of https://github.com/huydhn due to The internal diff D58707846 has been backed out ([comment](#143430 (comment)))
# Summary:
Full Context: https://docs.google.com/document/d/1-j5KSbfGFJQcH4sYh7BIeJXso3zYzl5G5yFQqXdKx_o/edit?usp=sharing

tl;dr

This change introduces classes which help determine a dynamic memory budget. This will mostly be helpful for models with many implicit graph breaks.

---

New Classes:

*GraphInfoProvider*
* Takes the joint_graph as well as the input memories and runtimes and parses the graph + values into usable forms for the SolverEvaluator.

*KnapsackEvaluator*
* Provides a function: Given all of the four inputs (solver function as a callable, max_dynamic_memory_budget, min_dynamic_memory_budget, dynamic_memory_budget_pareto_granularity) it returns an approximation of the knee point of the pareto distribution.

# Test Plan:

### LintRunner

LintRunner Output: P1700445547

### Unit Tests

```
$ buck test @mode/opt //caffe2/test/functorch:test_ac_knapsack
`@mode/opt` was specified, but not found. Using file at `//mode/opt`.
This behavior is being deprecated. Please use `"@//mode/opt"` instead
File changed: fbcode//caffe2/.ruff_cache/0.7.4/.tmpB6PmDS
File changed: fbsource//xplat/caffe2/test/functorch/test_ac_knapsack.py
File changed: fbcode//caffe2/.ruff_cache/0.7.4/.tmpyjCiPn
20 additional file change events
Buck UI: https://www.internalfb.com/buck2/414ead46-9ede-4192-8e1a-5d3c52bdb9cc
Test UI: https://www.internalfb.com/intern/testinfra/testrun/6473924710342830
Network: Up: 0B  Down: 0B  (reSessionID-159794b9-9d61-477e-8e63-9bdeaa537dca)
Analyzing targets. Remaining     0/214
Executing actions. Remaining     0/6933                                                                                                                                                                                  0.1s exec time total
Command: test.     Finished 1 local
Time elapsed: 18.5s
Tests finished: Pass 15. Fail 0. Fatal 0. Skip 0. Build failure 0
```

### Test Run

Updated the config:

```
      activation_memory_budget_solver: DYNAMIC_MEMORY_BUDGET_DP
```

Confirming proper execution via: [aps-fb_fm_v4_768_01_dynamic-2a792ba8af](https://www.internalfb.com/mlhub/pipelines/runs/mast/aps-fb_fm_v4_768_01_dynamic-2a792ba8af?job_attempt=0&version=0&env=PRODUCTION)

Pull Request resolved: #143539
Approved by: https://github.com/jansel
Fixes #ISSUE_NUMBER

Pull Request resolved: #141787
Approved by: https://github.com/albanD
Retracing while preserving module call signatures used to be a problem because graph modules don't have submodules at given paths. This led to a number of failing retracebility tests. By not trying to wrap modules with export tracepoints we can pass most of these tests; the only exception is where you do module swapping on retraced programs, which is still not possible.

Differential Revision: [D67539304](https://our.internmc.facebook.com/intern/diff/D67539304/)
Pull Request resolved: #143676
Approved by: https://github.com/zhxchen17, https://github.com/tugsbayasgalan
ghstack dependencies: #143664
This reverts commit 6733045.

Reverted #140030 on behalf of https://github.com/huydhn due to Sorry for reverting your change, but my first attempt to fix internal build does not fix all the cases, so let us try again ([comment](#140030 (comment)))
Fixes #ISSUE_NUMBER

Pull Request resolved: #143355
Approved by: https://github.com/albanD
Summary:
If module being quantized contains a some meta tensors and some tensors with actual device, we should not fail quantization.

Quantization should also not fail if new quantized module is created on a meta device.

Differential Revision: D66895899

Pull Request resolved: #142262
Approved by: https://github.com/iamzainhuda
etaf and others added 20 commits December 31, 2024 06:28
Fixes #136862

1.  removed dead code from torch/_dynamo/convert_frame.py
2.  ran `lintrunner -a` and all the tests passed.
3. ran the unit tests and everything seems to be in order.

Pull Request resolved: #140938
Approved by: https://github.com/zou3519
# Motivation
Due to the potential for the external SYCL queue to have a low priority, we need to support the low-priority SYCL queue for native XPU Streams to maintain consistency.

Pull Request resolved: #141119
Approved by: https://github.com/gujinghui, https://github.com/albanD
ghstack dependencies: #142347
# Motivation
This PR aims to introduce `torch.xpu.ExternalStream` to be used to wrap SYCL queue created in other libraries to PyTorch.

# Additional Context

Pull Request resolved: #141123
Approved by: https://github.com/albanD, https://github.com/EikanWang
ghstack dependencies: #142347, #141119
# Motivation
As mentioned in #141119 (comment), we properly handle the priority value if it is outside of the priority range.

# Additional Context
If the value falls outside of the allowed priority range, it will automatically be mapped to the nearest valid priority(either lowest or highest).

Pull Request resolved: #143849
Approved by: https://github.com/albanD, https://github.com/EikanWang
ghstack dependencies: #142347, #141119, #141123, #143799
By calling `metal::min` and `metal::max` respectively with argument
typecast to a common type to avoid ambiguous calls errors

TODO: Implement NaN propagation for both eager and compile, see #143976

`pytest test/inductor/test_torchinductor.py -k _mps` score is 460 failed, 291 passed, 32 skipped

Pull Request resolved: #143977
Approved by: https://github.com/jansel
ghstack dependencies: #143948, #143949, #143973
At the moment by generating multiple MetalLibraries

`pytest test/inductor/test_torchinductor.py -k _mps` score is 434 failed, 317 passed, 32 skipped

Pull Request resolved: #143998
Approved by: https://github.com/jansel, https://github.com/ruidazeng
ghstack dependencies: #143948, #143949, #143973, #143977
This reverts commit 135a2d4.

Reverted #142350 on behalf of https://github.com/jeanschmidt due to breaking internal signals ([comment](#142350 (comment)))
…de in (#143975)"

This reverts commit 7c1c073.

Reverted #143975 on behalf of https://github.com/jeanschmidt due to Need to revert in order to be able to revert #139321 feel free to merge it back once conflicts are cleared ([comment](#143975 (comment)))
This reverts commit 9e8d84f.

Reverted #139321 on behalf of https://github.com/jeanschmidt due to breaking internal signals ([comment](#139321 (comment)))
See #144006

```py
__________________________________________ CudaReproTests.test_repeated_masked_load __________________________________________
RuntimeError: First class dim doesn't work with python 3.12

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/jansel/conda/envs/pytorch/lib/python3.12/unittest/case.py", line 58, in testPartExecutor
    yield
  File "/home/jansel/conda/envs/pytorch/lib/python3.12/unittest/case.py", line 634, in run
    self._callTestMethod(testMethod)
  File "/home/jansel/conda/envs/pytorch/lib/python3.12/unittest/case.py", line 589, in _callTestMethod
    if method() is not None:
       ^^^^^^^^
  File "/home/jansel/pytorch/torch/testing/_internal/common_utils.py", line 3108, in wrapper
    method(*args, **kwargs)
  File "/home/jansel/pytorch/test/inductor/test_cuda_repro.py", line 1678, in test_repeated_masked_load
    from functorch.einops import rearrange
  File "/home/jansel/pytorch/functorch/einops/__init__.py", line 1, in <module>
    from .rearrange import rearrange
  File "/home/jansel/pytorch/functorch/einops/rearrange.py", line 7, in <module>
    from functorch._C import dim as _C
ImportError: initialization failed
```

Pull Request resolved: #144006
Approved by: https://github.com/Skylion007
Fixes #141426

Please see details in the issue.
Pull Request resolved: #141427
Approved by: https://github.com/jansel
As titled, this PR add a kwarg src_data_rank to the distribute_tensor
API, to allow user specify a specific rank as the full tensor source
data. Previously we by default specify group_rank=0 as the source of
truth for single device semantic, this new option:

* gives advanced user flexiblity to choose the source data rank
* allow user to specify None explicity, which means we will skip the
  communications needed (scatter/broadcast) for the cases that does not
care about single device semantic (i.e. loading from a checkpoint)

Pull Request resolved: #143883
Approved by: https://github.com/XilunWu, https://github.com/tianyu-l
as titled, this PR propagates the src_data_rank in the TP API, so that
module level APIs could leverage the flexibility to choose
src_data_rank, and avoid the communication if it does not need to

Pull Request resolved: #144005
Approved by: https://github.com/tianyu-l
ghstack dependencies: #143883
Followup after #143934, this check is no longer necessary and fixes a subset of inductor tests

Before `pytest test/inductor/test_torchinductor.py -k _mps` reports 463
failed, 291 passed, 32 skipped after 456 failed, 298 passed, 32 skipped
Pull Request resolved: #144055
Approved by: https://github.com/Skylion007
Change the label to make sure the jobs land on a
node which has >= 4 GPUs.

Pull Request resolved: #140319
Approved by: https://github.com/jeffdaily, https://github.com/malfet, https://github.com/kwen2501
@rocm-repo-management-api
Copy link
rocm-repo-management-api bot commented Jan 2, 2025

Jenkins build for 8f3eb843730f38d7307228485b1accc69c4aa0f0 commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

@rocm-repo-management-api
Copy link
rocm-repo-management-api bot commented Jan 2, 2025

Jenkins build for 8506a2af9aced8f084a27dbf73811a947a47d3f7 commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

# Summary:

This also makes updates to different repositories throughout FB code to roll any updates needed for this new release.

I was not able to get AsyncMM.cu to build (still trying) Yfiu suggested that I just skip it for now

Test Plan:
Have run various build commands to try and expose errors

Pull Request resolved: #143515
Approved by: https://github.com/eqy, https://github.com/Skylion007
@rocm-repo-management-api
Copy link
rocm-repo-management-api bot commented Jan 2, 2025

Jenkins build for a8c98ce175e20c071a209e7aa69f9f28897cda8b commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

Add networkx as a dependency for test_bazel

Example failure: https://github.com/pytorch/pytorch/actions/runs/12551752021/job/34996706301

```

INFO: From Testing //:test_bazel:
==================== Test output for //:test_bazel:
Traceback (most recent call last):
  File "/var/lib/jenkins/.cache/bazel/_bazel_jenkins/fdf6d09bf4b4f04a71e2a7dfceb40620/sandbox/processwrapper-sandbox/6504/execroot/pytorch/bazel-out/k8-fastbuild/bin/test_bazel.runfiles/pytorch/test/_test_bazel.py", line 33, in <module>
    test_simple_compile_eager()
  File "/var/lib/jenkins/.cache/bazel/_bazel_jenkins/fdf6d09bf4b4f04a71e2a7dfceb40620/sandbox/processwrapper-sandbox/6504/execroot/pytorch/bazel-out/k8-fastbuild/bin/test_bazel.runfiles/pytorch/test/_test_bazel.py", line 27, in test_simple_compile_eager
    opt_foo1 = torch.compile(foo, backend="eager")
  File "/var/lib/jenkins/.cache/bazel/_bazel_jenkins/fdf6d09bf4b4f04a71e2a7dfceb40620/sandbox/processwrapper-sandbox/6504/execroot/pytorch/bazel-out/k8-fastbuild/bin/test_bazel.runfiles/pytorch/torch/__init__.py", line 2533, in compile
    backend = _TorchCompileWrapper(backend, mode, options, dynamic)
  File "/var/lib/jenkins/.cache/bazel/_bazel_jenkins/fdf6d09bf4b4f04a71e2a7dfceb40620/sandbox/processwrapper-sandbox/6504/execroot/pytorch/bazel-out/k8-fastbuild/bin/test_bazel.runfiles/pytorch/torch/__init__.py", line 2342, in __init__
    self.compiler_fn = lookup_backend(backend)
  File "/var/lib/jenkins/.cache/bazel/_bazel_jenkins/fdf6d09bf4b4f04a71e2a7dfceb40620/sandbox/processwrapper-sandbox/6504/execroot/pytorch/bazel-out/k8-fastbuild/bin/test_bazel.runfiles/pytorch/torch/_dynamo/backends/registry.py", line 66, in lookup_backend
    _lazy_import()
  File "/var/lib/jenkins/.cache/bazel/_bazel_jenkins/fdf6d09bf4b4f04a71e2a7dfceb40620/sandbox/processwrapper-sandbox/6504/execroot/pytorch/bazel-out/k8-fastbuild/bin/test_bazel.runfiles/pytorch/torch/_dynamo/backends/registry.py", line 102, in _lazy_import
    import_submodule(backends)
  File "/var/lib/jenkins/.cache/bazel/_bazel_jenkins/fdf6d09bf4b4f04a71e2a7dfceb40620/sandbox/processwrapper-sandbox/6504/execroot/pytorch/bazel-out/k8-fastbuild/bin/test_bazel.runfiles/pytorch/torch/_dynamo/utils.py", line 2797, in import_submodule
    importlib.import_module(f"{mod.__name__}.{filename[:-3]}")
  File "/var/lib/jenkins/.cache/bazel/_bazel_jenkins/fdf6d09bf4b4f04a71e2a7dfceb40620/execroot/pytorch/external/python3_10_x86_64-unknown-linux-gnu/lib/python3.10/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 883, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/var/lib/jenkins/.cache/bazel/_bazel_jenkins/fdf6d09bf4b4f04a71e2a7dfceb40620/sandbox/processwrapper-sandbox/6504/execroot/pytorch/bazel-out/k8-fastbuild/bin/test_bazel.runfiles/pytorch/torch/_dynamo/backends/common.py", line 12, in <module>
    from torch._functorch.aot_autograd import (
  File "/var/lib/jenkins/.cache/bazel/_bazel_jenkins/fdf6d09bf4b4f04a71e2a7dfceb40620/sandbox/processwrapper-sandbox/6504/execroot/pytorch/bazel-out/k8-fastbuild/bin/test_bazel.runfiles/pytorch/torch/_functorch/aot_autograd.py", line 147, in <module>
    from .partitioners import default_partition
  File "/var/lib/jenkins/.cache/bazel/_bazel_jenkins/fdf6d09bf4b4f04a71e2a7dfceb40620/sandbox/processwrapper-sandbox/6504/execroot/pytorch/bazel-out/k8-fastbuild/bin/test_bazel.runfiles/pytorch/torch/_functorch/partitioners.py", line 31, in <module>
    from ._activation_checkpointing.graph_info_provider import GraphInfoProvider
  File "/var/lib/jenkins/.cache/bazel/_bazel_jenkins/fdf6d09bf4b4f04a71e2a7dfceb40620/sandbox/processwrapper-sandbox/6504/execroot/pytorch/bazel-out/k8-fastbuild/bin/test_bazel.runfiles/pytorch/torch/_functorch/_activation_checkpointing/graph_info_provider.py", line 3, in <module>
    import networkx as nx
ModuleNotFoundError: No module named 'networkx'
```

No periodic runs on this PR or its main branch commit, but I'm pretty sure its started on https://togithub.com/pytorch/pytorch/pull/143539

Pull Request resolved: #143995
Approved by: https://github.com/huydhn
@rocm-repo-management-api
Copy link
rocm-repo-management-api bot commented Jan 2, 2025

Jenkins build for bb5e439f2d8a46172b8b7d2fdb7609822b9a97b1 commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

@jeffdaily jeffdaily merged this pull request into ROCm:main Jan 2, 2025
1 check failed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

0