Tags: NripeshN/pytorch
Tags
[Device] Add "mps" to `torch._utils._get_device_attr` This is a regression introduced by pytorch#141098 that went unnoticed due to pytorch#142206 Test plan: ``` python test_autograd.py -v -k test_dataparallel_saved_tensors_hooks ``` Before this change it failed with ``` ERROR: test_dataparallel_saved_tensors_hooks (__main__.TestMultithreadAutograd.test_dataparallel_saved_tensors_hooks) ---------------------------------------------------------------------- Traceback (most recent call last): File "/Users/malfet/git/pytorch/pytorch/torch/testing/_internal/common_utils.py", line 3108, in wrapper method(*args, **kwargs) ~~~~~~^^^^^^^^^^^^^^^^^ File "/Users/malfet/git/pytorch/pytorch/test/test_autograd.py", line 13074, in test_dataparallel_saved_tensors_hooks model = torch.nn.DataParallel(Model()) File "/Users/malfet/git/pytorch/pytorch/torch/nn/parallel/data_parallel.py", line 153, in __init__ raise RuntimeError("no available devices were found") RuntimeError: no available devices were found ``` After it passes ```
[fr] change back vlog(2) to LOG(INFO) Summary: Change log message for future execution back from VLOG(2) to LOG(INFO). This message is useful for Flight Recorder to verify that flight recorder dumps completed successfully (or not). Test Plan: Tested manually on a mast job and noted that the INFO message was as expected. Differential Revision: D66996439
[Profiler] Add CUDA Overhead to Auto-trace (pytorch#142271) Summary: We already have CUDA OVERHEAD events enabled in on-demand so we should also add them to auto-trace Test Plan: Tested using servicelab and found no performance difference: kineto_benchmark duration_ms: 21668 number_of_events: 26542 profiler_prepare_call_duration_us: 970 profiler_enable_call_duration_us: 616474 profiling_window_duration_us: 2188525 profiler_disable_call_duration_us: 148628 parse_kineto_call_duration_us: 1672536 function_events_build_tree_call_duration_us: 285939 kineto_benchmark duration_ms: 21718 number_of_events: 26556 profiler_prepare_call_duration_us: 885 profiler_enable_call_duration_us: 7037 profiling_window_duration_us: 1772481 profiler_disable_call_duration_us: 174122 parse_kineto_call_duration_us: 1983683 function_events_build_tree_call_duration_us: 333582 Differential Revision: D66904879
Update on "[dtensor][cp][experiment] add CP experimental API to choos… …e rotate method" **Summary** This PR adds a new experimental API `set_rotate_method` for Context Parallel. This API allows user to choose the desired communication method (between all-to-all and all-gather) for shards rotation. **Test** `pytest test/distributed/_tensor/test_attention.py` cc H-Huang awgu kwen2501 wanchaol fegin fduwjj wz337 wconstab d4l3k c-p-i-o [ghstack-poisoned]
Update on "add torchrec collectives to enforce global ordering" cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang aakhundov [ghstack-poisoned]
Update on "Support tensor subclass unwrapping" Differential Revision: [D66690419](https://our.internmc.facebook.com/intern/diff/D66690419) This PR adds support for export to unwrap/wrap subclasses AOT so that we can trace through subclass parameters. This will resolve the UX issue in torchao where users had to manually unwrap their subclasses before calling export. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames [ghstack-poisoned]
Update on "Refactor NJT to hold metadata on nested int" Design: https://docs.google.com/document/d/1HV9719blS8OJxf8kuW5U3ihaoTR_H7sJJ29U7mT4J1g/edit?tab=t.0#heading=h.w4x2tmi9rtmd cc ezyang SherlockNoMad EikanWang jgong5 wenzhe-nrv voznesenskym penguinwu Guobing-Chen XiaobingSuper zhuhaozhe blzheng jiayisunx chenyang78 kadeng chauhang amjames [ghstack-poisoned]
PreviousNext