-
Notifications
You must be signed in to change notification settings - Fork 24.8k
Description
Platforms: linux, rocm, slow, inductor
This test was disabled because it is failing in CI. See recent examples and the most recent trunk workflow logs.
Over the past 3 hours, it has been determined flaky in 16 workflow(s) with 16 failures and 16 successes.
Debugging instructions (after clicking on the recent samples link):
DO NOT ASSUME THINGS ARE OKAY IF THE CI IS GREEN. We now shield flaky tests from developers so CI will thus be green but it will be harder to parse the logs.
To find relevant log snippets:
- Click on the workflow logs linked above
- Click on the Test step of the job so that it is expanded. Otherwise, the grepping will not work.
- Grep for
test_comprehensive_pca_lowrank_cuda_float32
- There should be several instances run (as flaky tests are rerun in CI) from which you can study the logs.
Sample error message
Traceback (most recent call last):
File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_device_type.py", line 1152, in test_wrapper
return test(*args, **kwargs)
File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_device_type.py", line 1434, in only_fn
return fn(self, *args, **kwargs)
File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2199, in wrapper
fn(*args, **kwargs)
File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_device_type.py", line 1229, in dep_fn
return fn(slf, *args, **kwargs)
File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_device_type.py", line 1229, in dep_fn
return fn(slf, *args, **kwargs)
File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_device_type.py", line 1229, in dep_fn
return fn(slf, *args, **kwargs)
File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 1592, in wrapper
fn(*args, **kwargs)
File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 1528, in wrapper
fn(*args, **kwargs)
File "/opt/conda/envs/py_3.10/lib/python3.10/unittest/mock.py", line 1379, in patched
return func(*newargs, **newkeywargs)
File "/opt/conda/envs/py_3.10/lib/python3.10/contextlib.py", line 79, in inner
return func(*args, **kwds)
File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor_opinfo.py", line 955, in inner
raise e
File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor_opinfo.py", line 947, in inner
fn(self, device, dtype, op)
File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor_opinfo.py", line 1193, in test_comprehensive
raise e
File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor_opinfo.py", line 1153, in test_comprehensive
self.check_model_gpu(
File "/opt/conda/envs/py_3.10/lib/python3.10/contextlib.py", line 79, in inner
return func(*args, **kwds)
File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 606, in check_model_gpu
check_model(
File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 557, in check_model
actual_grad = compute_grads(example_inputs, kwargs, actual, grads)
File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 344, in compute_grads
return torch.autograd.grad(
File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/__init__.py", line 496, in grad
result = _engine_run_backward(
File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py", line 825, in _engine_run_backward
return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/function.py", line 307, in apply
return user_fn(self, *args)
File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 1705, in backward
return impl_fn()
File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 1695, in impl_fn
out = CompiledFunction._backward_impl(ctx, all_args)
File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 2006, in _backward_impl
CompiledFunction.compiled_bw = aot_config.bw_compiler(
File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/backends/common.py", line 51, in _wrapped_bw_compiler
return disable(disable(bw_compiler)(*args, **kwargs))
File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 721, in _fn
return fn(*args, **kwargs)
File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_utils_internal.py", line 95, in wrapper_function
return function(*args, **kwargs)
File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1650, in bw_compiler
return inner_compile(
File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 587, in compile_fx_inner
return wrap_compiler_debug(_compile_fx_inner, compiler_name="inductor")(
File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/repro/after_aot.py", line 100, in debug_wrapper
inner_compiled_fn = compiler_fn(gm, example_inputs)
File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 744, in _compile_fx_inner
compiled_graph = FxGraphCache.load(
File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 1512, in load
compiled_graph = compile_fx_fn(
File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 651, in codegen_and_compile
compiled_graph = fx_codegen_and_compile(gm, example_inputs, **fx_kwargs)
File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1015, in fx_codegen_and_compile
compiled_graph = CompiledFxGraph(
File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 1655, in __init__
with open(graph.cache_path) as f:
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmpjhnij71z/uv/cuvrglhsawj5hchejqsla3dp6oggzvxlqkcmdqfviylaldxayxhu.py'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3057, in wrapper
method(*args, **kwargs)
File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3057, in wrapper
method(*args, **kwargs)
File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_device_type.py", line 460, in instantiated_test
result = test(self, **param_kwargs)
File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_cuda.py", line 243, in wrapped
return f(*args, **kwargs)
File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_device_type.py", line 1229, in dep_fn
return fn(slf, *args, **kwargs)
File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_device_type.py", line 1229, in dep_fn
return fn(slf, *args, **kwargs)
File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 1592, in wrapper
fn(*args, **kwargs)
File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_device_type.py", line 1164, in test_wrapper
raise e_tracked from e
Exception: Caused by sample input at index 1: SampleInput(input=Tensor[size=(3, 2), device="cuda:0", dtype=torch.float32], args=TensorList[Tensor[size=(4, 2), device="cuda:0", dtype=torch.float32]], kwargs={'q': '2', 'center': 'True'}, broadcasts_input=False, name='')
To execute this test, run the following from the base repo dir:
PYTORCH_OPINFO_SAMPLE_INPUT_INDEX=1 python test/inductor/test_torchinductor_opinfo.py TestInductorOpInfoCUDA.test_comprehensive_pca_lowrank_cuda_float32
This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
Test file path: inductor/test_torchinductor_opinfo.py
cc @clee2000 @ezyang @chauhang @penguinwu @voznesenskym @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @aakhundov