BUG: SIGABRT on using ThreadPoolExecutor with `linalg.eigvalsh` in v1.26.0b1 #24512

lagru · 2023-08-23T13:28:37Z

Describe the issue:

In scikit-image, we have started to encounter unexpected crashes in numpy.linalg.eigvalsh when used via a ThreadPoolExecutor with NumPy 1.26.0b1.

I have now managed to reduce the reproducing example from scikit-image/scikit-image#6970 (comment) to one only using NumPy (see below and also scikit-image/scikit-image#7101 (comment)). That's why I am reasonably confident that the error might originate on NumPy's side.

Some additional observations:

I can reproduce the error only with an installed pre-build NumPy 1.26.0b1. When compiling NumPy myself with python setup.py build_ext --inplace -j 4 the minimal example below passes.
The error disappears with num_workers=1.
BUG: Computer crash with numpy 1.23.4 and eigvals / eigvalsh on large matrices #22516 Possibly related?

Reproduce the code example:

import numpy as np
from concurrent.futures import ThreadPoolExecutor

assert np.__version__ == '1.26.0b1'

rng = np.random.default_rng(32)
matrices = (
    rng.random((5, 10, 10, 3, 3)),
    rng.random((5, 10, 10, 3, 3)),
    # rng.random((5, 10, 10, 3, 3)),
)

with ThreadPoolExecutor(max_workers=None) as ex:
    list(ex.map(lambda m: np.linalg.eigvalsh(m), matrices,))

Error message:

The erratic behavior seems a bit unstable. Most of the time I get the free(): invalid pointer SIGABRT, but sometimes the Traceback below concerning the illegal value and very rarely no error at all. This seems to depend a bit on the size of the passed array and number of concurrent tasks?

Traceback (most recent call last):
  File "/home/lg/Res/scikit-image/local/debug-pr7101.py", line 14, in <module>
    list(ex.map(lambda m: np.linalg.eigvalsh(m), matrices,))
  File "/usr/lib/python3.11/concurrent/futures/_base.py", line 619, in result_iterator
    yield _result_or_cancel(fs.pop())
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/concurrent/futures/_base.py", line 317, in _result_or_cancel
    return fut.result(timeout)
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/concurrent/futures/_base.py", line 449, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File "/usr/lib/python3.11/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/lg/Res/scikit-image/local/debug-pr7101.py", line 14, in <lambda>
    list(ex.map(lambda m: np.linalg.eigvalsh(m), matrices,))
                          ^^^^^^^^^^^^^^^^^^^^^
  File "/home/lg/.local/lib/venv/skimagedev/lib/python3.11/site-packages/numpy/linalg/linalg.py", line 1181, in eigvalsh
    w = gufunc(a, signature=signature, extobj=extobj)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: On entry to DSYEVD parameter number 8 had an illegal value

Runtime information:

Pre-build v1.26.0b1

1.26.0b1
3.11.3 (main, Jun  5 2023, 09:32:32) [GCC 13.1.1 20230429]

[{'numpy_version': '1.26.0b1',
  'python': '3.11.3 (main, Jun  5 2023, 09:32:32) [GCC 13.1.1 20230429]',
  'uname': uname_result(system='Linux', node='hue', release='6.4.11-arch2-1', version='#1 SMP PREEMPT_DYNAMIC Sat, 19 Aug 2023 15:38:34 +0000', machine='x86_64')},
 {'simd_extensions': {'baseline': ['SSE', 'SSE2', 'SSE3'],
                      'found': ['SSSE3',
                                'SSE41',
                                'POPCNT',
                                'SSE42',
                                'AVX',
                                'F16C',
                                'FMA3',
                                'AVX2'],
                      'not_found': ['AVX512F',
                                    'AVX512CD',
                                    'AVX512_KNL',
                                    'AVX512_KNM',
                                    'AVX512_SKX',
                                    'AVX512_CLX',
                                    'AVX512_CNL',
                                    'AVX512_ICL']}},
 {'architecture': 'Haswell',
  'filepath': '/home/lg/.local/lib/venv/skimagedev/lib/python3.11/site-packages/numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so',
  'internal_api': 'openblas',
  'num_threads': 8,
  'prefix': 'libopenblas',
  'threading_layer': 'pthreads',
  'user_api': 'blas',
  'version': '0.3.23.dev'}]

In-place build v1.26.0b1 from source

1.26.0b1
3.9.17 | packaged by conda-forge | (main, Aug 10 2023, 07:02:31)
[GCC 12.3.0]

[{'numpy_version': '1.26.0b1',
'python': '3.9.17 | packaged by conda-forge | (main, Aug 10 2023, '
'07:02:31) \n'
'[GCC 12.3.0]',
'uname': uname_result(system='Linux', node='hue', release='6.4.11-arch2-1', version='#1 SMP PREEMPT_DYNAMIC Sat, 19 Aug 2023 15:38:34 +0000', machine='x86_64')},
{'simd_extensions': {'baseline': ['SSE', 'SSE2', 'SSE3'],
'found': ['SSSE3',
'SSE41',
'POPCNT',
'SSE42',
'AVX',
'F16C',
'FMA3',
'AVX2'],
'not_found': ['AVX512F',
'AVX512CD',
'AVX512_SKX',
'AVX512_CLX',
'AVX512_CNL',
'AVX512_ICL',
'AVX512_SPR']}},
{'architecture': 'Haswell',
'filepath': '/home/lg/.local/lib/micromamba/envs/numpy-dev/lib/libopenblasp-r0.3.23.so',
'internal_api': 'openblas',
'num_threads': 8,
'prefix': 'libopenblas',
'threading_layer': 'pthreads',
'user_api': 'blas',
'version': '0.3.23'}]

gdb --args python local/debug-pr7101.py

debug-pr7101.py contains the minimal example above.

$ gdb --args python local/debug-pr7101.py
GNU gdb (GDB) 13.2
Copyright (C) 2023 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-pc-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from python...

This GDB supports auto-downloading debuginfo from the following URLs:
<https://debuginfod.archlinux.org>
Enable debuginfod for this session? (y or [n]) y
Debuginfod has been enabled.
To make this setting permanent, add 'set debuginfod enabled on' to .gdbinit.
Downloading separate debug info for /usr/bin/python3.11
Reading symbols from /home/lg/.cache/debuginfod_client/9efa8fdb1fce89c7a9f29802398a366b6c913a3e/debuginfo...
(gdb) r
Starting program: /home/lg/.local/lib/venv/skimagedev/bin/python local/debug-pr7101.py
Downloading separate debug info for /lib64/ld-linux-x86-64.so.2
Downloading separate debug info for system-supplied DSO at 0x7ffff7fc8000
Downloading separate debug info for /usr/lib/libc.so.6
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".
Downloading separate debug info for /home/lg/.local/lib/venv/skimagedev/lib/python3.11/site-packages/numpy/core/_multiarray_umath.cpython-311-x86_64-linux-gnu.so
Downloading separate debug info for /home/lg/.local/lib/venv/skimagedev/lib/python3.11/site-packages/numpy/core/../../numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so
Downloading separate debug info for /home/lg/.local/lib/venv/skimagedev/lib/python3.11/site-packages/numpy/core/../../numpy.libs/libgfortran-040039e1.so.5.0.0
Downloading separate debug info for /home/lg/.local/lib/venv/skimagedev/lib/python3.11/site-packages/numpy/core/../../numpy.libs/libquadmath-96973f99.so.0.0.0
[New Thread 0x7ffff3bff6c0 (LWP 10872)]
[New Thread 0x7ffff33fe6c0 (LWP 10873)]
[New Thread 0x7ffff0bfd6c0 (LWP 10874)]
[New Thread 0x7fffec3fc6c0 (LWP 10875)]
[New Thread 0x7fffe9bfb6c0 (LWP 10876)]
[New Thread 0x7fffe73fa6c0 (LWP 10877)]
[New Thread 0x7fffe6bf96c0 (LWP 10878)]
Downloading separate debug info for /home/lg/.local/lib/venv/skimagedev/lib/python3.11/site-packages/numpy/core/_multiarray_tests.cpython-311-x86_64-linux-gnu.so
Downloading separate debug info for /home/lg/.local/lib/venv/skimagedev/lib/python3.11/site-packages/numpy/linalg/_umath_linalg.cpython-311-x86_64-linux-gnu.so
Downloading separate debug info for /home/lg/.local/lib/venv/skimagedev/lib/python3.11/site-packages/numpy/fft/_pocketfft_internal.cpython-311-x86_64-linux-gnu.so
Downloading separate debug info for /home/lg/.local/lib/venv/skimagedev/lib/python3.11/site-packages/numpy/random/mtrand.cpython-311-x86_64-linux-gnu.so
Downloading separate debug info for /home/lg/.local/lib/venv/skimagedev/lib/python3.11/site-packages/numpy/random/bit_generator.cpython-311-x86_64-linux-gnu.so
Downloading separate debug info for /home/lg/.local/lib/venv/skimagedev/lib/python3.11/site-packages/numpy/random/_common.cpython-311-x86_64-linux-gnu.so
Downloading separate debug info for /home/lg/.local/lib/venv/skimagedev/lib/python3.11/site-packages/numpy/random/_bounded_integers.cpython-311-x86_64-linux-gnu.so
Downloading separate debug info for /home/lg/.local/lib/venv/skimagedev/lib/python3.11/site-packages/numpy/random/_mt19937.cpython-311-x86_64-linux-gnu.so
Downloading separate debug info for /home/lg/.local/lib/venv/skimagedev/lib/python3.11/site-packages/numpy/random/_philox.cpython-311-x86_64-linux-gnu.so
Downloading separate debug info for /home/lg/.local/lib/venv/skimagedev/lib/python3.11/site-packages/numpy/random/_pcg64.cpython-311-x86_64-linux-gnu.so
Downloading separate debug info for /home/lg/.local/lib/venv/skimagedev/lib/python3.11/site-packages/numpy/random/_sfc64.cpython-311-x86_64-linux-gnu.so
Downloading separate debug info for /home/lg/.local/lib/venv/skimagedev/lib/python3.11/site-packages/numpy/random/_generator.cpython-311-x86_64-linux-gnu.so
[New Thread 0x7fffe170d6c0 (LWP 10879)]
[New Thread 0x7fffe0f0c6c0 (LWP 10880)]
free(): invalid pointer
free(): invalid pointer

Thread 9 "python" received signal SIGABRT, Aborted.
[Switching to Thread 0x7fffe170d6c0 (LWP 10879)]
__pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0)
at pthread_kill.c:44
Downloading source file /usr/src/debug/glibc/glibc/nptl/pthread_kill.c
44            return INTERNAL_SYSCALL_ERROR_P (ret) ? INTERNAL_SYSCALL_ERRNO (ret) : 0;
(gdb) bt
#0  __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6,
no_tid=no_tid@entry=0) at pthread_kill.c:44
#1  0x00007ffff748e8a3 in __pthread_kill_internal (signo=6, threadid=<optimized out>)
at pthread_kill.c:78
#2  0x00007ffff743e668 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#3  0x00007ffff74264b8 in __GI_abort () at abort.c:79
#4  0x00007ffff7427390 in __libc_message (fmt=fmt@entry=0x7ffff75a3550 "%s\n")
at ../sysdeps/posix/libc_fatal.c:150
#5  0x00007ffff74987b7 in malloc_printerr (str=str@entry=0x7ffff75a102b "free(): invalid pointer")
at malloc.c:5765
#6  0x00007ffff749aa74 in _int_free (av=<optimized out>, p=<optimized out>,
have_lock=have_lock@entry=0) at malloc.c:4500
#7  0x00007ffff749d353 in __GI___libc_free (mem=<optimized out>) at malloc.c:3391
#8  0x00007fffe201d0a9 in void eigh_wrapper<double>(char, char, char**, long const*, long const*) [clone .constprop.0] ()
from /home/lg/.local/lib/venv/skimagedev/lib/python3.11/site-packages/numpy/linalg/_umath_linalg.cpython-311-x86_64-linux-gnu.so
#9  0x00007ffff6a4cef5 in generic_wrapped_legacy_loop ()
from /home/lg/.local/lib/venv/skimagedev/lib/python3.11/site-packages/numpy/core/_multiarray_umath.cpython-311-x86_64-linux-gnu.so
#10 0x00007ffff6a5b9ce in ufunc_generic_fastcall ()
from /home/lg/.local/lib/venv/skimagedev/lib/python3.11/site-packages/numpy/core/_multiarray_umath.cpython-311-x86_64-linux-gnu.so
#11 0x00007ffff79f20d7 in _PyObject_VectorcallTstate (kwnames=<optimized out>,
nargsf=<optimized out>, args=<optimized out>, callable=0x7ffff3fb3840, tstate=0x555555ae04c0)
at ./Include/internal/pycore_call.h:92
#12 PyObject_Vectorcall (callable=0x7ffff3fb3840, args=<optimized out>, nargsf=<optimized out>,
kwnames=<optimized out>) at Objects/call.c:299
#13 0x00007ffff79e4379 in _PyEval_EvalFrameDefault (tstate=<optimized out>, frame=<optimized out>,
throwflag=<optimized out>) at Python/ceval.c:4773
#14 0x00007ffff7a0a9e0 in _PyEval_EvalFrame (throwflag=0, frame=0x7ffff76ce318, tstate=0x555555ae04c0)
at ./Include/internal/pycore_ceval.h:73
#15 _PyEval_Vector (kwnames=<optimized out>, argcount=1, args=0x7ffff76ce310, locals=0x0,
func=0x7fffe2369bc0, tstate=0x555555ae04c0) at Python/ceval.c:6438
#16 _PyFunction_Vectorcall (func=0x7fffe2369bc0, stack=0x7ffff76ce310, nargsf=<optimized out>,
kwnames=<optimized out>) at Objects/call.c:393
#17 0x00007ffff79f20d7 in _PyObject_VectorcallTstate (kwnames=<optimized out>,
nargsf=<optimized out>, args=<optimized out>, callable=0x7fffe2369bc0, tstate=0x555555ae04c0)
at ./Include/internal/pycore_call.h:92
#18 PyObject_Vectorcall (callable=0x7fffe2369bc0, args=<optimized out>, nargsf=<optimized out>,
kwnames=<optimized out>) at Objects/call.c:299
#19 0x00007ffff694407d in dispatcher_vectorcall ()
from /home/lg/.local/lib/venv/skimagedev/lib/python3.11/site-packages/numpy/core/_multiarray_umath.cpython-311-x86_64-linux-gnu.so
#20 0x00007ffff79f20d7 in _PyObject_VectorcallTstate (kwnames=<optimized out>,
nargsf=<optimized out>, args=<optimized out>, callable=0x7fffe2362db0, tstate=0x555555ae04c0)
--Type <RET> for more, q to quit, c to continue without paging--
at ./Include/internal/pycore_call.h:92
#21 PyObject_Vectorcall (callable=0x7fffe2362db0, args=<optimized out>, nargsf=<optimized out>,
kwnames=<optimized out>) at Objects/call.c:299
#22 0x00007ffff79e4379 in _PyEval_EvalFrameDefault (tstate=<optimized out>, frame=<optimized out>,
throwflag=<optimized out>) at Python/ceval.c:4773
#23 0x00007ffff7a0a9e0 in _PyEval_EvalFrame (throwflag=0, frame=0x7ffff76ce2b0, tstate=0x555555ae04c0)
at ./Include/internal/pycore_ceval.h:73
#24 _PyEval_Vector (kwnames=<optimized out>, argcount=1, args=0x7fffe1755498, locals=0x0,
func=0x7ffff77984a0, tstate=0x555555ae04c0) at Python/ceval.c:6438
#25 _PyFunction_Vectorcall (func=0x7ffff77984a0, stack=0x7fffe1755498, nargsf=<optimized out>,
kwnames=<optimized out>) at Objects/call.c:393
#26 0x00007ffff79e7e77 in do_call_core (use_tracing=<optimized out>, kwdict=0x7fffe17706c0,
callargs=0x7fffe1755480, func=0x7ffff77984a0, tstate=<optimized out>) at Python/ceval.c:7356
#27 _PyEval_EvalFrameDefault (tstate=<optimized out>, frame=<optimized out>,
throwflag=<optimized out>) at Python/ceval.c:5380
#28 0x00007ffff7a0a9e0 in _PyEval_EvalFrame (throwflag=0, frame=0x7ffff76ce188, tstate=0x555555ae04c0)
at ./Include/internal/pycore_ceval.h:73
#29 _PyEval_Vector (kwnames=<optimized out>, argcount=4, args=0x7ffff6f2f528, locals=0x0,
func=0x7fffe176d3a0, tstate=0x555555ae04c0) at Python/ceval.c:6438
#30 _PyFunction_Vectorcall (func=0x7fffe176d3a0, stack=0x7ffff6f2f528, nargsf=<optimized out>,
kwnames=<optimized out>) at Objects/call.c:393
#31 0x00007ffff79e7e77 in do_call_core (use_tracing=<optimized out>, kwdict=0x7ffff77f3b40,
callargs=0x7ffff6f2f510, func=0x7fffe176d3a0, tstate=<optimized out>) at Python/ceval.c:7356
#32 _PyEval_EvalFrameDefault (tstate=<optimized out>, frame=<optimized out>,
throwflag=<optimized out>) at Python/ceval.c:5380
#33 0x00007ffff7a2c403 in _PyEval_EvalFrame (throwflag=0, frame=0x7ffff76ce020, tstate=0x555555ae04c0)
at ./Include/internal/pycore_ceval.h:73
#34 _PyEval_Vector (kwnames=<optimized out>, argcount=<optimized out>, args=0x7fffe170ce28,
locals=0x0, func=0x7fffe1fe6d40, tstate=0x555555ae04c0) at Python/ceval.c:6438
#35 _PyFunction_Vectorcall (kwnames=<optimized out>, nargsf=<optimized out>, stack=0x7fffe170ce28,
func=0x7fffe1fe6d40) at Objects/call.c:393
#36 _PyObject_VectorcallTstate (tstate=0x555555ae04c0, callable=0x7fffe1fe6d40, args=0x7fffe170ce28,
nargsf=<optimized out>, kwnames=<optimized out>) at ./Include/internal/pycore_call.h:92
#37 0x00007ffff7a2c0c8 in method_vectorcall (method=<optimized out>,
args=0x7ffff7d6f6b0 <_PyRuntime+58928>, nargsf=<optimized out>, kwnames=0x0)
at Objects/classobject.c:67
#38 0x00007ffff7af4fe0 in thread_run (boot_raw=0x7fffe1756760) at ./Modules/_threadmodule.c:1092
#39 0x00007ffff7acad28 in pythread_wrapper (arg=<optimized out>) at Python/thread_pthread.h:241
#40 0x00007ffff748c9eb in start_thread (arg=<optimized out>) at pthread_create.c:444
#41 0x00007ffff7510dfc in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78
(gdb)

Context for the issue:

This is currently blocking us from upgrading our dependency on NumPy to 1.26.0b1 for scikit-image in scikit-image/scikit-image#7101. It's been a very tricky thing to debug and I am a bit out of my depth now. :)

The text was updated successfully, but these errors were encountered:

mattip · 2023-08-23T13:59:22Z

Thanks for whittling it down to a minimal reproducer. It sounds like something to do with OpenBLAS threading:

it disappears when you use 1 worker
you sometimes see the OpenBLAS error message "On entry to DSYEVD parameter number 8 had an illegal value"
The problem goes away when you use the older OpenBLAS 0.3.23, rather than OpenBLAS from commit c2f4bdbb used in the 1.26.0b1 wheel

Workarounds

Can you see how many threads are open when you use your worker pool? I think you might be getting 8 for each worker, and each of those threads allocates a working buffer.

I see OpenBLAS wants to use 8 threads on your machine. Could you control this with either threadpoolctl or via setting the OPENBLAS_NUM_THREADS environment variable?

Further analysis

@martin-frbg, do you know of anyhting that might have caused a regression between 0.3.23 and 0.3.23 + c2f4bdbb?

martin-frbg · 2023-08-23T15:02:45Z

Not immediately aware of anything that could have caused this (btw. I use the Milestone feature of gh to track non-trivial changes for the next release). Will a build from source automatically pull in 0.3.23 on whatever platform ? Incidentally, INFO=8 from DSYEVD means "your work array is too small"

mattip · 2023-08-23T16:44:03Z

Will a build from source automatically pull in 0.3.23 on whatever platform

A build from source will pull in whatever is on the platform via pkg-config. The wheel builds download and provision a specific version to be available via pkg-config before building.

martin-frbg · 2023-08-23T19:58:19Z

I can reproduce this, but only when I deliberately build libopenblas for a smaller NUM_THREADS than actually present in the target system. Are you building the "experimental" c2f4bdbb with the exact same parameters that the previously used OpenBLAS binary was built with, especially NUM_THREADS (which defaults to the number of cores in the build host) ?
Having provisioned for fewer cores/threads than available has always been a situation to avoid with OpenBLAS. I thought I had successfully solved that problem by having OpenBLAS allocate auxiliary data structures on the fly if needed, but it looks like there may be a race condition in the current code, or an undersized array not handled by the fix.
At least currently it looks like the problem is not caused by anything done after the 0.3.23 release, but I have not figured out where memory management goes wrong. (The test case always runs fine under valgrind)

mattip · 2023-08-24T06:28:09Z

Hmm. Nothing changed in the build scripts since 0.3.23. But calling openblas_get_config64_ on the so (on ubuntu) results in

OpenBLAS 0.3.23.dev  USE64BITINT DYNAMIC_ARCH NO_AFFINITY Zen MAX_THREADS=64

I see this line in the windows make call:

make BINARY=$build_bits DYNAMIC_ARCH=1 USE_THREAD=1 USE_OPENMP=0 \
     NUM_THREADS=24 NO_WARMUP=1 NO_AFFINITY=1 CONSISTENT_FPCSR=1 \
...

and this in the posix make call

CFLAGS="$CFLAGS -fvisibility=protected" \
make BUFFERSIZE=20 DYNAMIC_ARCH=1 USE_OPENMP=0 NUM_THREADS=64 \
    BINARY=$bitness $interface64_flags $target_flags > /dev/null

So it does seem we are setting NUM_THREADS in the build, perhaps we should boost the number for windows.

lagru · 2023-08-24T08:53:23Z

Thanks for the quick feedback!

I see OpenBLAS wants to use 8 threads on your machine. Could you control this with either threadpoolctl [...]

In the failing environment with pre-build NumPy 1.26.0b1

$ python -m threadpoolctl -i numpy
[
  {
    "user_api": "blas",
    "internal_api": "openblas",
    "num_threads": 8,
    "prefix": "libopenblas",
    "filepath": "/home/lg/.local/lib/venv/skimagedev/lib/python3.11/site-packages/numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so",
    "version": "0.3.23.dev",
    "threading_layer": "pthreads",
    "architecture": "Haswell"
  }
]

I get the same output in the environment with the passing self-build NumPy, except for the version field which is "0.3.23" instead of "0.3.23.dev".

lagru · 2023-08-24T08:58:22Z

Not sure if I am doing it wrong, but

OPENBLAS_NUM_THREADS=1 python local/debug-pr7101.py

seems to have no impact, regardless of the values 0, 1, 2, 4, 8, 16. The error persists.

martin-frbg · 2023-08-24T10:10:11Z

Hm, strange. And if numpy is really running OpenBLAS with just 8 threads instead of "all it can get", you should be safely below either of the two platform-specific compile time limits Matti mentioned. Maybe I just tried too hard to create a failing configuration of OpenBLAS and the actual problem is elsewhere ? (BTW it is still a bit unclear to me from the logs scattered across multiple issue tickets what your hardware and operating system is. I see Windows mentioned, but the quoted paths look unixoid ?)

mattip · 2023-08-24T10:26:03Z

Sorry, I may have muddied the waters by mentioning windows. The system under test is linux + python 3.11 as can be seen by opening the "Pre-build v1.26.0b1" details subsection above.

lagru · 2023-08-24T10:40:41Z

Yes.

import sys; print(sys.version)
import platform; print(platform.platform())
# 3.11.3 (main, Jun  5 2023, 09:32:32) [GCC 13.1.1 20230429]
# Linux-6.4.11-arch2-1-x86_64-with-glibc2.38

You can also see this in action on our CI in scikit-image/scikit-image#7101. It fails on linux-cp3.11-pre but on Windows "Default Python311-x64-pre" too.

martin-frbg · 2023-08-24T12:43:28Z

Thanks. With a bit of patience, the failures are also reproducible with the 0.3.23 release (and the build-time NUM_THREADS set to 24 on a 4-core hardware). So at least no recent regression in OpenBLAS, and it seems to me that some of the "double free"/"invalid pointer" messages are generated before OpenBLAS gets initialized - at least before a DYNAMIC_ARCH build announces (with OPENBLAS_VERBOSE=2) which cpu it has detected.

martin-frbg · 2023-08-24T13:26:30Z

gdb backtraces lead back to a free() in NumPy's eigh_wrapper from .../dist_packages/numpy/linalg/_umath_linalg.cpython-310-x86_64-linux-gnu.so for which I lack a debuggable version (this is an Ubuntu 22 VM with Python 3.10.6, numpy upgraded to 1.26b via pip)

mattip · 2023-08-24T13:56:49Z

That leads to here, which allocates some buffers, linearizes (copies) the input into the buffers, and then calls one of the call_evd which calls CHEEVD or ZHEEVD.

mattip · 2023-08-24T13:59:18Z

I seem to recall seeing On entry to F parameter number N had an illegal value when we had register corruption on windows. Maybe we are seeing something similar?

martin-frbg · 2023-08-24T14:04:07Z

possible, though right now I am not even sure that OpenBLAS' DSYEVD is ever reached. (Unless the python/numpy environment catches any write to stdout from Fortran code)

rgommers · 2023-08-24T16:09:53Z

This may be a case where we should do the work of separating NumPy vs. OpenBLAS by comparing with Netlib and creating a pure C or Fortran reproducer for OpenBLAS if we do determine it's specific to OpenBLAS and not Netlib.

Cc @steppi who is working on streamlining that process as much as possible.

steppi · 2023-08-24T16:12:02Z

This may be a case where we should do the work of separating NumPy vs. OpenBLAS by comparing with Netlib and creating a pure C or Fortran reproducer for OpenBLAS if we do determine it's specific to OpenBLAS and not Netlib.

Cc @steppi who is working on streamlining that process as much as possible.

On it!

martin-frbg · 2023-08-24T16:36:23Z

Thanks. All I can say so far is that I see no evidence (neither from print statements added to the code nor from gdb breakpoints) that OpenBLAS' implementations of DSYEVD and XERBLA are ever entered in the sequence that leads to the LAPACK-like error message, and libopenblas does not feature in any gdb backtrace.

martin-frbg · 2023-08-24T17:14:07Z

... and I see the exact same (mis)behaviour when I replace NumPy's libopenblas with 0.3.21, or 0.3.15.

steppi · 2023-08-24T17:40:59Z

I'm seeing the same misbehavior with netlib reference BLAS as well. I'm using FlexiBLAS to swap out BLAS versions, and was able to replicate by building numpy from the branch maintenance/1.26.x.

steppi · 2023-08-29T14:58:54Z

I've identified that numpy is calling to the non-threadsafe lapack_lite rather than the lapack seen in np.__show__.config(). This seems to only occur with the meson build, which is why @lagru couldn't reproduce when building with

python setup.py build_ext --inplace -j 4

I'm building with the following (one needs to set up flexiblas for this to work), and have reproduced on main.

spin build --clean -- -Dblas=flexiblas -Dlapack=flexiblas

Below are some details of what I observed during the debugging process:

Within umath_linalg.cpp in eigh_wrapper and the initialization code it calls init_evd, I added print statements which include the thread id before and after each of the steps.

Call init_evd
Query for optimal work array size
Initialize
Do work
Free

Of the three behaviors seen, things work correctly when one thread completes all of its work before the other. One sees the DYSEVD invalid parameter error when the second thread queries for the optimal work array size while the first is doing work. One sees free(): invalid pointer when the first thread frees after the second thread calls init_evd but before it queries for the optimal work array size.

That this lack of thread safety is seen when numpy claims to be using either OpenBLAS or NETLIB BLAS, who's DYSEVD implementations are thread safe and battle tested is a pretty strong clue. After doing some digging, I found that lapack_lite is not thread safe and had a hunch it is getting called here for some reason. I put a print statement in the relevant part of the lapack_lite source and surely enough it's output was printed when I ran the reproducer.

I'll keep looking into this to see what could be going wrong in the meson build.

steppi · 2023-08-29T16:32:08Z

@rgommers found the issue in https://github.com/numpy/numpy/blob/main/numpy/linalg/meson.build. The _umath_linalg extension is missing blas and lapack in its sources. It's a simple fix.

Closes numpygh-24512, where `linalg.eigvalsh` was observed to be non-thread safe. This was due to the non-thread safe `lapack_lite` being called instead of the installed BLAS/LAPACK. Co-authored-by: Albert Steppi <albert.steppi@gmail.com>

Closes gh-24512, where `linalg.eigvalsh` was observed to be non-thread safe. This was due to the non-thread safe `lapack_lite` being called instead of the installed BLAS/LAPACK. Co-authored-by: Ralf Gommers <ralf.gommers@gmail.com>

Closes numpygh-24512, where `linalg.eigvalsh` was observed to be non-thread safe. This was due to the non-thread safe `lapack_lite` being called instead of the installed BLAS/LAPACK. Co-authored-by: Ralf Gommers <ralf.gommers@gmail.com>

lagru · 2023-08-31T07:58:37Z

Thanks to everyone for tackling this so quickly! 👍

Closes numpygh-24512, where `linalg.eigvalsh` was observed to be non-thread safe. This was due to the non-thread safe `lapack_lite` being called instead of the installed BLAS/LAPACK. Co-authored-by: Ralf Gommers <ralf.gommers@gmail.com>

lagru added the 00 - Bug label Aug 23, 2023

lagru mentioned this issue Aug 23, 2023

Fix for NumPy 1.26 scikit-image/scikit-image#7101

Merged

charris added this to the 1.26.0 release milestone Aug 24, 2023

steppi mentioned this issue Aug 29, 2023

BLD: fix _umath_linalg dependencies #24584

Merged

lagru mentioned this issue Aug 30, 2023

Cannot install your dev wheel when numpy dev wheel is installed scikit-image/scikit-image#7110

Closed

mattip mentioned this issue Aug 30, 2023

ENH: Introduce tracer for enabled CPU targets on each optimized function #24420

Merged

rgommers closed this as completed in #24584 Aug 30, 2023

charris mentioned this issue Aug 30, 2023

BLD: fix _umath_linalg dependencies #24591

Merged

ngoldbaum mentioned this issue Jun 13, 2024

ENH: add locking to python lapacklite interface #26687

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: SIGABRT on using ThreadPoolExecutor with `linalg.eigvalsh` in v1.26.0b1 #24512

BUG: SIGABRT on using ThreadPoolExecutor with `linalg.eigvalsh` in v1.26.0b1 #24512

BUG: SIGABRT on using ThreadPoolExecutor with linalg.eigvalsh in v1.26.0b1 #24512

BUG: SIGABRT on using ThreadPoolExecutor with linalg.eigvalsh in v1.26.0b1 #24512

Comments

Describe the issue:

Reproduce the code example:

Error message:

Runtime information:

Context for the issue:

Workarounds

Further analysis

BUG: SIGABRT on using ThreadPoolExecutor with `linalg.eigvalsh` in v1.26.0b1 #24512

BUG: SIGABRT on using ThreadPoolExecutor with `linalg.eigvalsh` in v1.26.0b1 #24512