8000 DISABLED AotInductorTest.BasicPackageLoaderTestCuda (build.bin.test_aoti_inference) · Issue #152674 · pytorch/pytorch · GitHub
[go: up one dir, main page]

Skip to content
DISABLED AotInductorTest.BasicPackageLoaderTestCuda (build.bin.test_aoti_inference) #152674
@pytorch-bot

Description

@pytorch-bot

Platforms: inductor

This test was disabled because it is failing in CI. See recent examples and the most recent trunk workflow logs.

Over the past 3 hours, it has been determined flaky in 3 workflow(s) with 3 failures and 3 successes.

Debugging instructions (after clicking on the recent samples link):
DO NOT ASSUME THINGS ARE OKAY IF THE CI IS GREEN. We now shield flaky tests from developers so CI will thus be green but it will be harder to parse the logs.
To find relevant log snippets:

  1. Click on the workflow logs linked above
  2. Click on the Test step of the job so that it is expanded. Otherwise, the grepping will not work.
  3. Grep for AotInductorTest.BasicPackageLoaderTestCuda
  4. There should be several instances run (as flaky tests are rerun in CI) from which you can study the logs.
Sample error message
unknown file
C++ exception with description "CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

Exception raised from c10_cuda_check_implementation at /var/lib/jenkins/workspace/c10/cuda/CUDAException.cpp:42 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x9c (0x7f697b6bb1cc in /var/lib/jenkins/workspace/build/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0x104 (0x7f697b64b73a in /var/lib/jenkins/workspace/build/lib/libc10.so)
frame #2: c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, int, bool) + 0x40d (0x7f697b788e3d in /var/lib/jenkins/workspace/build/lib/libc10_cuda.so)
frame #3: void at::native::gpu_kernel_impl_nocast<at::native::BinaryFunctor<float, float, bool, at::native::(anonymous namespace)::CompareEqFunctor<float> > >(at::TensorIteratorBase&, at::native::BinaryFunctor<float, float, bool, at::native::(anonymous namespace)::CompareEqFunctor<float> > const&) + 0x7fd (0x7f696645c69d in /var/lib/jenkins/workspace/build/lib/libtorch_cuda.so)
frame #4: void at::native::gpu_kernel_impl<at::native::BinaryFunctor<float, float, bool, at::native::(anonymous namespace)::CompareEqFunctor<float> > >(at::TensorIteratorBase&, at::native::BinaryFunctor<float, float, bool, at::native::(anonymous namespace)::CompareEqFunctor<float> > const&) + 0x44f (0x7f696645cddf in /var/lib/jenkins/workspace/build/lib/libtorch_cuda.so)
frame #5: void at::native::gpu_kernel<at::native::BinaryFunctor<float, float, bool, at::native::(anonymous namespace)::CompareEqFunctor<float> > >(at::TensorIteratorBase&, at::native::BinaryFunctor<float, float, bool, at::native::(anonymous namespace)::CompareEqFunctor<float> > const&) + 0x35b (0x7f696645d57b in /var/lib/jenkins/workspace/build/lib/libtorch_cuda.so)
frame #6: void at::native::opmath_symmetric_gpu_kernel_with_scalars<float, bool, at::native::(anonymous namespace)::CompareEqFunctor<float> >(at::TensorIteratorBase&, at::native::(anonymous namespace)::CompareEqFunctor<float> const&) + 0x195 (0x7f6966483445 in /var/lib/jenkins/workspace/build/lib/libtorch_cuda.so)
frame #7: at::native::compare_eq_ne_kernel(at::TensorIteratorBase&, at::native::(anonymous namespace)::EqOpType) + 0x178 (0x7f6966432128 in /var/lib/jenkins/workspace/build/lib/libtorch_cuda.so)
frame #8: <unknown function> + 0x3cbae29 (0x7f69684bae29 in /var/lib/jenkins/workspace/build/lib/libtorch_cuda.so)
frame #9: <unknown function> + 0x3cbaef8 (0x7f69684baef8 in /var/lib/jenkins/workspace/build/lib/libtorch_cuda.so)
frame #10: at::_ops::eq_Tensor::call(at::Tensor const&, at::Tensor const&) + 0x1b2 (0x7f697dc76fb2 in /var/lib/jenkins/workspace/build/lib/libtorch_cpu.so)
frame #11: at::native::isclose(at::Tensor const&, at::Tensor const&, double, double, bool) + 0xbe (0x7f697d678a7e in /var/lib/jenkins/workspace/build/lib/libtorch_cpu.so)
frame #12: <unknown function> + 0x30d56dc (0x7f697e8d56dc in /var/lib/jenkins/workspace/build/lib/libtorch_cpu.so)
frame #13: at::_ops::isclose::call(at::Tensor const&, at::Tensor const&, double, double, bool) + 0x1ee (0x7f697e3a1aae in /var/lib/jenkins/workspace/build/lib/libtorch_cpu.so)
frame #14: at::native::allclose(at::Tensor const&, at::Tensor const&, double, double, bool) + 0x37 (0x7f697d6767d7 in /var/lib/jenkins/workspace/build/lib/libtorch_cpu.so)
frame #15: <unknown function> + 0x5027cc5 (0x7f6980827cc5 in /var/lib/jenkins/workspace/build/lib/libtorch_cpu.so)
frame #16: at::_ops::allclose::call(at::Tensor const&, at::Tensor const&, double, double, bool) + 0x1cd (0x7f697dc6e5bd in /var/lib/jenkins/workspace/build/lib/libtorch_cpu.so)
frame #17: <unknown function> + 0x336ff (0x5573a72216ff in /var/lib/jenkins/workspace/build/bin/test_aoti_inference)
frame #18: torch::aot_inductor::AotInductorTest_BasicPackageLoaderTestCuda_Test::TestBody() + 0x41 (0x5573a7221ad1 in /var/lib/jenkins/workspace/build/bin/test_aoti_inference)
frame #19: void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) + 0x51 (0x5573a7273271 in /var/lib/jenkins/workspace/build/bin/test_aoti_inference)
frame #20: <unknown function> + 0x750a0 (0x5573a72630a0 in /var/lib/jenkins/workspace/build/bin/test_aoti_inference)
frame #21: testing::TestInfo::Run() + 0x40a (0x5573a72635ba in /var/lib/jenkins/workspace/build/bin/test_aoti_inference)
frame #22: <unknown function> + 0x79699 (0x5573a7267699 in /var/lib/jenkins/workspace/build/bin/test_aoti_inference)
frame #23: testing::internal::UnitTestImpl::RunAllTests() + 0xf28 (0x5573a7268ae8 in /var/lib/jenkins/workspace/build/bin/test_aoti_inference)
frame #24: testing::UnitTest::Run() + 0x93 (0x5573a72692b3 in /var/lib/jenkins/workspace/build/bin/test_aoti_inference)
frame #25: main + 0x104 (0x5573a721d8f4 in /var/lib/jenkins/workspace/build/bin/test_aoti_inference)
frame #26: __libc_start_main + 0xf3 (0x7f696407e083 in /usr/lib/x86_64-linux-gnu/libc.so.6)
frame #27: _start + 0x2e (0x5573a721f48e in /var/lib/jenkins/workspace/build/bin/test_aoti_inference)
" thrown in the test body.
unknown file:0: C++ failure

Test file path: `` or test/run_test

Error: Error retrieving : 400, test/run_test: 404

cc @ezyang @gchanan @zou3519 @kadeng @msaroufim @clee2000 @chauhang @penguinwu @avikchaudhuri @gmagogsfm @zhxchen17 @tugsbayasgalan @angelayi @suo @ydwu4 @desertfire @chenyang78 @yushangdi @benjaminglass1

Metadata

Metadata

Assignees

No one assigned

    Labels

    high prioritymodule: aotinductoraot inductormodule: flaky-testsProblem is a flaky test in CIoncall: exportoncall: pt2skippedDenotes a (flaky) test currently skipped in CI.triagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0