CI: Compiler sanitizers tests are hanging intermittently #25875

charris · 2024-02-22T19:59:56Z

The actions label is gcc_sanitizers. All of the test runs show errors, some of which look valid, but none cause the test to fail. I have to wonder where the bogus values come from, are they byproducts of the sanitizer? See https://github.com/numpy/numpy/actions/runs/8008572322/job/21875289661 for examples.

I also note that the time was normally around 20 minutes, it is now well in excess of 2 hours. Something has changed.

The text was updated successfully, but these errors were encountered:

ngoldbaum · 2024-02-22T21:17:38Z

I believe all of those errors are from UBSan and do not fail that job until #24209 is fixed.

I also note that the time was normally around 20 minutes, it is now well in excess of 2 hours. Something has changed.

The job you linked to ran in 20 minutes. Do you have a job where it took hours to run?

ngoldbaum · 2024-02-22T21:27:26Z

Ah like this one: https://github.com/numpy/numpy/actions/runs/8008729929/job/21875802671

Yes, there's a heisenbug that crashes the test runner every so often. It only happens with the compiler sanitizers job and may be a bug in the GCC sanitizer implementation, I haven't been able to reproduce it on clang. It might also be a real issue.

charris · 2024-02-22T21:29:32Z

I cancelled that one, it wasn't about to finish any time soon.

EDIT: But was still running.

ngoldbaum · 2024-02-22T21:42:37Z

The crash is such that the test run doesn't actually end, it times out after six hours. I agree, not great!

mattip · 2024-03-13T11:04:51Z

Reopening until we are sure the test is no longer hanging. It also seems there is a failure that is not picked up by the pytest mechanism

numpy/_core/tests/test_api.py::test_copyto_fromscalar ../numpy/_core/src/multiarray/common.h:288:31: runtime error: load of misaligned address 0x6020000c7212 for type 'unsigned int', which requires 4 byte alignment
0x6020000c7212: note: pointer points here
 00 00  00 01 00 00 00 01 00 00  00 00 00 00 00 00 00 00  00 11 00 00 04 00 00 00  07 00 00 3c 00 00
              ^ 
PASSED
numpy/_core/tests/test_api.py::test_copyto PASSED
numpy/_core/tests/test_api.py::test_copyto_permut ../numpy/_core/src/multiarray/common.h:288:31: runtime error: load of misaligned address 0x6020001d4492 for type 'unsigned int', which requires 4 byte alignment
0x6020001d4492: note: pointer points here
 00 00  00 01 00 01 00 01 00 01  00 00 00 00 00 00 00 00  03 11 00 00 09 00 00 00  07 00 00 3c 00 00
              ^ 
PASSED

mattip · 2024-03-13T11:06:12Z

Actually, searching that log for "runtime error" shows many of them...

seberg · 2024-03-13T11:08:09Z

I think most of them are somehwat intentional. I.e. some code choses to ignore alignment on platforms where we know that is OK (and probably better), but the sanitizers complain it anyway.

Not sure what to do about those, maybe those code-paths were just optimizations from a time long past, and using a safe code-path the compiler will do fast code anyway.
(I am also OK to just ignore the issue, since unaligned arrays are pretty rare either way.)

ngoldbaum · 2024-03-13T11:49:43Z

Those are all UBsan errors that won’t fail the build until #24209 is fixed.

8000

mattip · 2024-03-13T14:02:56Z

Ahh, thanks, I missed that. I changed the title of #24209 so a search for sanitizer makes it more prominent.

ngoldbaum · 2024-03-13T21:12:39Z

I just looked at one of the recent failures. It looks like this test is failing in a new way, where if you look in the raw logs there are many many lines like:

2024-03-13T10:24:04.4788196Z AddressSanitizer:DEADLYSIGNAL
2024-03-13T10:24:04.4788381Z AddressSanitizer:DEADLYSIGNAL
2024-03-13T10:24:04.4788558Z AddressSanitizer:DEADLYSIGNAL
2024-03-13T10:24:04.4788738Z AddressSanitizer:DEADLYSIGNAL
2024-03-13T10:24:04.4788928Z AddressSanitizer:DEADLYSIGNAL
2024-03-13T10:24:04.4789110Z AddressSanitizer:DEADLYSIGNAL
2024-03-13T10:24:04.4789286Z AddressSanitizer:DEADLYSIGNAL
2024-03-13T10:24:04.4789470Z AddressSanitizer:DEADLYSIGNAL
2024-03-13T10:24:04.4789652Z AddressSanitizer:DEADLYSIGNAL
2024-03-13T10:24:04.4789834Z AddressSanitizer:DEADLYSIGNAL
2024-03-13T10:24:04.4790017Z AddressSanitizer:DEADLYSIGNAL
2024-03-13T10:24:04.4790193Z AddressSanitizer:DEADLYSIGNAL
2024-03-13T10:24:04.4790375Z AddressSanitizer:DEADLYSIGNAL
2024-03-13T10:24:04.4790560Z AddressSanitizer:DEADLYSIGNAL
2024-03-13T10:24:04.4790744Z AddressSanitizer:DEADLYSIGNAL
2024-03-13T10:24:04.4790933Z AddressSanitizer:DEADLYSIGNAL
2024-03-13T10:24:04.4791116Z AddressSanitizer:DEADLYSIGNAL

I'm not sure why this is getting printed to stderr every 20 microseconds or so, and only on some test runs. It actually seems to start before the tests even begin executing:

2024-03-13T10:23:59.5807981Z Downloading sortedcontainers-2.4.0-py2.py3-none-any.whl (29 kB)
2024-03-13T10:23:59.5853061Z Downloading iniconfig-2.0.0-py3-none-any.whl (5.9 kB)
2024-03-13T10:23:59.7449512Z Installing collected packages: sortedcontainers, typing_extensions, pluggy, iniconfig, execnet, attrs, pytest, hypothesis, pytest-xdist
2024-03-13T10:24:00.3361522Z Successfully installed attrs-23.2.0 execnet-2.0.2 hypothesis-6.99.5 iniconfig-2.0.0 pluggy-1.4.0 pytest-8.1.1 pytest-xdist-3.5.0 sortedcontainers-2.4.0 typing_extensions-4.10.0
2024-03-13T10:24:00.6105621Z �[92m�[1mInvoking `build` prior to running tests:�[0m
2024-03-13T10:24:00.9137798Z �[94m�[1m$ /opt/hostedtoolcache/Python/3.11.8/x64/bin/python vendored-meson/meson/meson.py compile -C build�[0m
2024-03-13T10:24:00.9166244Z AddressSanitizer:DEADLYSIGNAL
2024-03-13T10:24:00.9167759Z AddressSanitizer:DEADLYSIGNAL
2024-03-13T10:24:00.9168550Z AddressSanitizer:DEADLYSIGNAL
2024-03-13T10:24:00.9169412Z AddressSanitizer:DEADLYSIGNAL
2024-03-13T10:24:00.9170111Z AddressSanitizer:DEADLYSIGNAL
2024-03-13T10:24:00.9170756Z AddressSanitizer:DEADLYSIGNAL
2024-03-13T10:24:00.9171404Z AddressSanitizer:DEADLYSIGNAL
2024-03-13T10:24:00.9175971Z AddressSanitizer:DEADLYSIGNAL
2024-03-13T10:24:00.9176791Z AddressSanitizer:DEADLYSIGNAL
2024-03-13T10:24:00.9177516Z AddressSanitizer:DEADLYSIGNAL
2024-03-13T10:24:00.9178080Z AddressSanitizer:DEADLYSIGNAL
2024-03-13T10:24:00.9178603Z AddressSanitizer:DEADLYSIGNAL
2024-03-13T10:24:00.9179115Z AddressSanitizer:DEADLYSIGNAL
2024-03-13T10:24:00.9179624Z AddressSanitizer:DEADLYSIGNAL

I guess if this gets to be too annoying we can disable the tests. We could also look into using the clang sanitizers, which might be more stable than the gcc sanitizers.

ngoldbaum · 2024-03-14T21:01:12Z

It's now failing on every run in the same way. I still don't understand why this is happening so I've manually disabled the workflow in the github actions settings.

I think if we build numpy with clang we should be able to use the clang sanitizers which are generally better tested (google uses them internally on all code).

ngoldbaum · 2024-04-18T17:34:59Z

Darn, here's one that's hanging with the clang sanitizers: https://github.com/numpy/numpy/actions/runs/8741299034/job/23986999441

ngoldbaum · 2024-05-21T20:32:13Z

This hasn't happened in a while I think with the switch to clang so I'm closing.

ngoldbaum changed the title ~~linux_compiler_sanitizers.yml shows multiple errors, but still passes.~~ Compiler sanitizers tests are hanging intermittently Feb 22, 2024

ngoldbaum changed the title ~~Compiler sanitizers tests are hanging intermittently~~ CI: Compiler sanitizers tests are hanging intermittently Feb 22, 2024

This was referenced Mar 7, 2024

BUG: address sanitizer test failure in new string dtype #25957

Closed

TST: remove usage of ProcessPoolExecutor in stringdtype tests #26006

Merged

rgommers added the component: CI label Mar 12, 2024

mattip closed this as completed in #26006 M 8000 ar 13, 2024

mattip reopened this Mar 13, 2024

ngoldbaum mentioned this issue Mar 14, 2024

BUG: raise error trying to coerce object arrays containing timedelta64('NaT') to StringDType #26024

Merged

ngoldbaum mentioned this issue Apr 16, 2024

CI: add llvm/clang sanitizer tests #26295

Merged

seberg closed this as completed in #26295 Apr 18, 2024

ngoldbaum reopened this Apr 18, 2024

ngoldbaum mentioned this issue Apr 18, 2024

CI: add llvm/clang sanitizer tests #26308

Closed

ngoldbaum closed this as completed May 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

CI: Compiler sanitizers tests are hanging intermittently #25875

CI: Compiler sanitizers tests are hanging intermittently #25875

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

CI: Compiler sanitizers tests are hanging intermittently #25875

CI: Compiler sanitizers tests are hanging intermittently #25875

Comments

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!