-
Notifications
You must be signed in to change notification settings - Fork 24.3k
MacOS tests has not been running for few weeks #142206
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Landed @clee2000's #142270 to enable testing
Forward fixes:
|
For the people looking through the tests, I merged #142421 to enable keep-going/continue on error on trunk for mac default tests. Red signal will show up later, but you can see failing tests mid run on HUD by clicking the additional test failures button mid run. When the run is finished, you can also search for "consistently: " in the logs |
This is a regression introduced by #141098 that went unnoticed due to #142206 Test plan: ``` python test_autograd.py -v -k test_dataparallel_saved_tensors_hooks ``` Before this change it failed with ``` ERROR: test_dataparallel_saved_tensors_hooks (__main__.TestMultithreadAutograd.test_dataparallel_saved_tensors_hooks) ---------------------------------------------------------------------- Traceback (most recent call last): File "/Users/malfet/git/pytorch/pytorch/torch/testing/_internal/common_utils.py", line 3108, in wrapper method(*args, **kwargs) ~~~~~~^^^^^^^^^^^^^^^^^ File "/Users/malfet/git/pytorch/pytorch/test/test_autograd.py", line 13074, in test_dataparallel_saved_tensors_hooks model = torch.nn.DataParallel(Model()) File "/Users/malfet/git/pytorch/pytorch/torch/nn/parallel/data_parallel.py", line 153, in __init__ raise RuntimeError("no available devices were found") RuntimeError: no available devices were found ``` After it passes ```
As `torch._C._scatter` is only defined for CUDA/ROCm (and may be XPU?) This is a regression introduced by #141098 that went unnoticed due to #142206 Test plan: ``` python test_autograd.py -v -k test_dataparallel_saved_tensors_hooks ``` Before this change it failed with ``` ERROR: test_dataparallel_saved_tensors_hooks (__main__.TestMultithreadAutograd.test_dataparallel_saved_tensors_hooks) ---------------------------------------------------------------------- Traceback (most recent call last): File "/Users/malfet/git/pytorch/pytorch/torch/testing/_internal/common_utils.py", line 3108, in wrapper method(*args, **kwargs) ~~~~~~^^^^^^^^^^^^^^^^^ File "/Users/malfet/git/pytorch/pytorch/test/test_autograd.py", line 13074, in test_dataparallel_saved_tensors_hooks model = torch.nn.DataParallel(Model()) File "/Users/malfet/git/pytorch/pytorch/torch/nn/parallel/data_parallel.py", line 153, in __init__ raise RuntimeError("no available devices were found") RuntimeError: no available devices were found ``` After this change it passes
Where int64_t is long long rather than long This fixes test regression introduced by #140597 that went undetected due to #142206 Pull Request resolved: #142440 Approved by: https://github.com/kit1980
As `torch._C._scatter` is only defined for CUDA/ROCm (and may be XPU?) This is a regression introduced by #141098 that went unnoticed due to #142206 Test plan: ``` python test_autograd.py -v -k test_dataparallel_saved_tensors_hooks ``` Before this change it failed with ``` ERROR: test_dataparallel_saved_tensors_hooks (__main__.TestMultithreadAutograd.test_dataparallel_saved_tensors_hooks) ---------------------------------------------------------------------- Traceback (most recent call last): File "/Users/malfet/git/pytorch/pytorch/torch/testing/_internal/common_utils.py", line 3108, in wrapper method(*args, **kwargs) ~~~~~~^^^^^^^^^^^^^^^^^ File "/Users/malfet/git/pytorch/pytorch/test/test_autograd.py", line 13074, in test_dataparallel_saved_tensors_hooks model = torch.nn.DataParallel(Model()) File "/Users/malfet/git/pytorch/pytorch/torch/nn/parallel/data_parallel.py", line 153, in __init__ raise RuntimeError("no available devices were found") RuntimeError: no available devices were found ``` After this change it passes Pull Request resolved: #142448 Approved by: https://github.com/kit1980
Mitigated, see successful run here |
What would be good to discuss in post mortem:
|
Where int64_t is long long rather than long This fixes test regression introduced by pytorch#140597 that went undetected due to pytorch#142206 Pull Request resolved: pytorch#142440 Approved by: https://github.com/kit1980
As `torch._C._scatter` is only defined for CUDA/ROCm (and may be XPU?) This is a regression introduced by pytorch#141098 that went unnoticed due to pytorch#142206 Test plan: ``` python test_autograd.py -v -k test_dataparallel_saved_tensors_hooks ``` Before this change it failed with ``` ERROR: test_dataparallel_saved_tensors_hooks (__main__.TestMultithreadAutograd.test_dataparallel_saved_tensors_hooks) ---------------------------------------------------------------------- Traceback (most recent call last): File "/Users/malfet/git/pytorch/pytorch/torch/testing/_internal/common_utils.py", line 3108, in wrapper method(*args, **kwargs) ~~~~~~^^^^^^^^^^^^^^^^^ File "/Users/malfet/git/pytorch/pytorch/test/test_autograd.py", line 13074, in test_dataparallel_saved_tensors_hooks model = torch.nn.DataParallel(Model()) File "/Users/malfet/git/pytorch/pytorch/torch/nn/parallel/data_parallel.py", line 153, in __init__ raise RuntimeError("no available devices were found") RuntimeError: no available devices were found ``` After this change it passes Pull Request resolved: pytorch#142448 Approved by: https://github.com/kit1980
Where int64_t is long long rather than long This fixes test regression introduced by pytorch#140597 that went undetected due to pytorch#142206 Pull Request resolved: pytorch#142440 Approved by: https://github.com/kit1980
As `torch._C._scatter` is only defined for CUDA/ROCm (and may be XPU?) This is a regression introduced by pytorch#141098 that went unnoticed due to pytorch#142206 Test plan: ``` python test_autograd.py -v -k test_dataparallel_saved_tensors_hooks ``` Before this change it failed with ``` ERROR: test_dataparallel_saved_tensors_hooks (__main__.TestMultithreadAutograd.test_dataparallel_saved_tensors_hooks) ---------------------------------------------------------------------- Traceback (most recent call last): File "/Users/malfet/git/pytorch/pytorch/torch/testing/_internal/common_utils.py", line 3108, in wrapper method(*args, **kwargs) ~~~~~~^^^^^^^^^^^^^^^^^ File "/Users/malfet/git/pytorch/pytorch/test/test_autograd.py", line 13074, in test_dataparallel_saved_tensors_hooks model = torch.nn.DataParallel(Model()) File "/Users/malfet/git/pytorch/pytorch/torch/nn/parallel/data_parallel.py", line 153, in __init__ raise RuntimeError("no available devices were found") RuntimeError: no available devices were found ``` After this change it passes Pull Request resolved: pytorch#142448 Approved by: https://github.com/kit1980
Where int64_t is long long rather than long This fixes test regression introduced by pytorch/pytorch#140597 that went undetected due to pytorch/pytorch#142206 ghstack-source-id: e7e260c Pull Request resolved: pytorch/pytorch#142440
No post-mortem discussion ever happened, but tests are running now, so closing |
Uh oh!
There was an error while loading. Please reload this page.
#135386 rendered regular MacOS test shard useless
I.e. https://github.com/pytorch/pytorch/actions/runs/12191328925/job/34010247281?pr=141921 finishes in 18 sec for PR #141921 that could have some effect on Mac testsVersions
CI
cc @ezyang @gchanan @zou3519 @kadeng @msaroufim @seemethere @pytorch/pytorch-dev-infra
The text was updated successfully, but these errors were encountered: