Tags · NripeshN/pytorch

ciflow/xpu/141479

rebase to 5d6acd5

Dec 10, 2024
88d6bf8
zip
tar.gz

ciflow/trunk/142447

[Device] Add "mps" to `torch._utils._get_device_attr`

This is a regression introduced by pytorch#141098 that went unnoticed due to pytorch#142206

Test plan:
```
python test_autograd.py -v -k test_dataparallel_saved_tensors_hooks
```

Before this change it failed with
```
ERROR: test_dataparallel_saved_tensors_hooks (__main__.TestMultithreadAutograd.test_dataparallel_saved_tensors_hooks)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/malfet/git/pytorch/pytorch/torch/testing/_internal/common_utils.py", line 3108, in wrapper
    method(*args, **kwargs)
    ~~~~~~^^^^^^^^^^^^^^^^^
  File "/Users/malfet/git/pytorch/pytorch/test/test_autograd.py", line 13074, in test_dataparallel_saved_tensors_hooks
    model = torch.nn.DataParallel(Model())
  File "/Users/malfet/git/pytorch/pytorch/torch/nn/parallel/data_parallel.py", line 153, in __init__
    raise RuntimeError("no available devices were found")
RuntimeError: no available devices were found
```

After it passes
```

Dec 10, 2024
ec585c8
zip
tar.gz

ciflow/trunk/142441

[fr] change back vlog(2) to LOG(INFO)

Summary:
Change log message for future execution back from VLOG(2) to LOG(INFO).
This message is useful for Flight Recorder to verify that flight recorder dumps completed successfully (or not).

Test Plan: Tested manually on a mast job and noted that the INFO message was as expected.

Differential Revision: D66996439

Dec 10, 2024
867391a
zip
tar.gz

ciflow/trunk/142271

[Profiler] Add CUDA Overhead to Auto-trace (pytorch#142271)

Summary:

We already have CUDA OVERHEAD events enabled in on-demand so we should also add them to auto-trace

Test Plan:
Tested using servicelab and found no performance difference:
kineto_benchmark
    duration_ms: 21668
    number_of_events: 26542
    profiler_prepare_call_duration_us: 970
    profiler_enable_call_duration_us: 616474
    profiling_window_duration_us: 2188525
    profiler_disable_call_duration_us: 148628
    parse_kineto_call_duration_us: 1672536
    function_events_build_tree_call_duration_us: 285939


kineto_benchmark
    duration_ms: 21718
    number_of_events: 26556
    profiler_prepare_call_duration_us: 885
    profiler_enable_call_duration_us: 7037
    profiling_window_duration_us: 1772481
    profiler_disable_call_duration_us: 174122
    parse_kineto_call_duration_us: 1983683
    function_events_build_tree_call_duration_us: 333582

Differential Revision: D66904879

Dec 10, 2024
02e2c3e
zip
tar.gz

ciflow/trunk/142093

Update on "[dtensor][cp][experiment] add CP experimental API to choos…

…e rotate method"


**Summary**
This PR adds a new experimental API `set_rotate_method` for Context Parallel. This API allows user to choose the desired communication method (between all-to-all and all-gather) for shards rotation.

**Test**
`pytest test/distributed/_tensor/test_attention.py`

cc H-Huang awgu kwen2501 wanchaol fegin fduwjj wz337 wconstab d4l3k c-p-i-o

[ghstack-poisoned]

Dec 10, 2024
e962b46
zip
tar.gz

ciflow/trunk/141970

Update on "add torchrec collectives to enforce global ordering"

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang aakhundov

[ghstack-poisoned]

Dec 10, 2024
f32dda8
zip
tar.gz

ciflow/trunk/141941

Update on "Support tensor subclass unwrapping"


Differential Revision: [D66690419](https://our.internmc.facebook.com/intern/diff/D66690419)

This PR adds support for export to unwrap/wrap subclasses AOT so that we can trace through subclass parameters. This will resolve the UX issue in torchao where users had to manually unwrap their subclasses before calling export.

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames

[ghstack-poisoned]

Dec 10, 2024
adadeb9
zip
tar.gz

ciflow/trunk/141857

Add support for bfloat16 atomic adds in fbcode

Dec 10, 2024
e57cad4
zip
tar.gz

ciflow/trunk/141842

Update on "Refactor NJT to hold metadata on nested int"


Design: https://docs.google.com/document/d/1HV9719blS8OJxf8kuW5U3ihaoTR_H7sJJ29U7mT4J1g/edit?tab=t.0#heading=h.w4x2tmi9rtmd

cc ezyang SherlockNoMad EikanWang jgong5 wenzhe-nrv voznesenskym penguinwu Guobing-Chen XiaobingSuper zhuhaozhe blzheng jiayisunx chenyang78 kadeng chauhang amjames

[ghstack-poisoned]

Dec 10, 2024
d5777e8
zip
tar.gz

ciflow/trunk/141453

lint minor refine

Dec 10, 2024
9e3de2d
zip
tar.gz

PreviousNext

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ciflow/xpu/141479

ciflow/trunk/142447

ciflow/trunk/142441

ciflow/trunk/142271

ciflow/trunk/142093

ciflow/trunk/141970

ciflow/trunk/141941

ciflow/trunk/141857

ciflow/trunk/141842

ciflow/trunk/141453

Tags: NripeshN/pytorch