10000 BUG: SIGABRT on using ThreadPoolExecutor with `linalg.eigvalsh` in v1.26.0b1 · Issue #24512 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

BUG: SIGABRT on using ThreadPoolExecutor with linalg.eigvalsh in v1.26.0b1 #24512

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub&rdq 8000 uo;, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
lagru opened this issue Aug 23, 2023 · 23 comments · Fixed by #24584
Closed

BUG: SIGABRT on using ThreadPoolExecutor with linalg.eigvalsh in v1.26.0b1 #24512

lagru opened this issue Aug 23, 2023 · 23 comments · Fixed by #24584

Comments

@lagru
Copy link
Contributor
lagru commented Aug 23, 2023

Describe the issue:

In scikit-image, we have started to encounter unexpected crashes in numpy.linalg.eigvalsh when used via a ThreadPoolExecutor with NumPy 1.26.0b1.

I have now managed to reduce the reproducing example from scikit-image/scikit-image#6970 (comment) to one only using NumPy (see below and also scikit-image/scikit-image#7101 (comment)). That's why I am reasonably confident that the error might originate on NumPy's side.

Some additional observations:

Reproduce the code example:

import numpy as np
from concurrent.futures import ThreadPoolExecutor

assert np.__version__ == '1.26.0b1'

rng = np.random.default_rng(32)
matrices = (
    rng.random((5, 10, 10, 3, 3)),
    rng.random((5, 10, 10, 3, 3)),
    # rng.random((5, 10, 10, 3, 3)),
)

with ThreadPoolExecutor(max_workers=None) as ex:
    list(ex.map(lambda m: np.linalg.eigvalsh(m), matrices,))

Error message:

The erratic behavior seems a bit unstable. Most of the time I get the free(): invalid pointer SIGABRT, but sometimes the Traceback below concerning the illegal value and very rarely no error at all. This seems to depend a bit on the size of the passed array and number of concurrent tasks?

Traceback (most recent call last):
  File "/home/lg/Res/scikit-image/local/debug-pr7101.py", line 14, in <module>
    list(ex.map(lambda m: np.linalg.eigvalsh(m), matrices,))
  File "/usr/lib/python3.11/concurrent/futures/_base.py", line 619, in result_iterator
    yield _result_or_cancel(fs.pop())
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/concurrent/futures/_base.py", line 317, in _result_or_cancel
    return fut.result(timeout)
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/concurrent/futures/_base.py", line 449, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File "/usr/lib/python3.11/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/lg/Res/scikit-image/local/debug-pr7101.py", line 14, in <lambda>
    list(ex.map(lambda m: np.linalg.eigvalsh(m), matrices,))
                          ^^^^^^^^^^^^^^^^^^^^^
  File "/home/lg/.local/lib/venv/skimagedev/lib/python3.11/site-packages/numpy/linalg/linalg.py", line 1181, in eigvalsh
    w = gufunc(a, signature=signature, extobj=extobj)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: On entry to DSYEVD parameter number 8 had an illegal value

Runtime information:

Pre-build v1.26.0b1

1.26.0b1
3.11.3 (main, Jun  5 2023, 09:32:32) [GCC 13.1.1 20230429]

[{'numpy_version': '1.26.0b1',
  'python': '3.11.3 (main, Jun  5 2023, 09:32:32) [GCC 13.1.1 20230429]',
  'uname': uname_result(system='Linux', node='hue', release='6.4.11-arch2-1', version='#1 SMP PREEMPT_DYNAMIC Sat, 19 Aug 2023 15:38:34 +0000', machine='x86_64')},
 {'simd_extensions': {'baseline': ['SSE', 'SSE2', 'SSE3'],
                      'found': ['SSSE3',
                                'SSE41',
                                'POPCNT',
                                'SSE42',
                                'AVX',
                                'F16C',
                                'FMA3',
                                'AVX2'],
                      'not_found': ['AVX512F',
                                    'AVX512CD',
                                    'AVX512_KNL',
                                    'AVX512_KNM',
                                    'AVX512_SKX',
                                    'AVX512_CLX',
                                    'AVX512_CNL',
                                    'AVX512_ICL']}},
 {'architecture': 'Haswell',
  'filepath': '/home/lg/.local/lib/venv/skimagedev/lib/python3.11/site-packages/numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so',
  'internal_api': 'openblas',
  'num_threads': 8,
  'prefix': 'libopenblas',
  'threading_layer': 'pthreads',
  'user_api': 'blas',
  'version': '0.3.23.dev'}]

In-place build v1.26.0b1 from source

1.26.0b1
3.9.17 | packaged by conda-forge | (main, Aug 10 2023, 07:02:31)
[GCC 12.3.0]

[{'numpy_version': '1.26.0b1',
'python': '3.9.17 | packaged by conda-forge | (main, Aug 10 2023, '
'07:02:31) \n'
'[GCC 12.3.0]',
'uname': uname_result(system='Linux', node='hue', release='6.4.11-arch2-1', version='#1 SMP PREEMPT_DYNAMIC Sat, 19 Aug 2023 15:38:34 +0000', machine='x86_64')},
{'simd_extensions': {'baseline': ['SSE', 'SSE2', 'SSE3'],
'found': ['SSSE3',
'SSE41',
'POPCNT',
'SSE42',
'AVX',
'F16C',
'FMA3',
'AVX2'],
'not_found': ['AVX512F',
'AVX512CD',
'AVX512_SKX',
'AVX512_CLX',
'AVX512_CNL',
'AVX512_ICL',
'AVX512_SPR']}},
{'architecture': 'Haswell',
'filepath': '/home/lg/.local/lib/micromamba/envs/numpy-dev/lib/libopenblasp-r0.3.23.so',
'internal_api': 'openblas',
'num_threads': 8,
'prefix': 'libopenblas',
'threading_layer': 'pthreads',
'user_api': 'blas',
'version': '0.3.23'}]

gdb --args python local/debug-pr7101.py

debug-pr7101.py contains the minimal example above.

$ gdb --args python local/debug-pr7101.py
GNU gdb (GDB) 13.2
Copyright (C) 2023 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-pc-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from python...

This GDB supports auto-downloading debuginfo from the following URLs:
<https://debuginfod.archlinux.org>
Enable debuginfod for this session? (y or [n]) y
Debuginfod has been enabled.
To make this setting permanent, add 'set debuginfod enabled on' to .gdbinit.
Downloading separate debug info for /usr/bin/python3.11
Reading symbols from /home/lg/.cache/debuginfod_client/9efa8fdb1fce89c7a9f29802398a366b6c913a3e/debuginfo...
(gdb) r
Starting program: /home/lg/.local/lib/venv/skimagedev/bin/python local/debug-pr7101.py
Downloading separate debug info for /lib64/ld-linux-x86-64.so.2
Downloading separate debug info for system-supplied DSO at 0x7ffff7fc8000
Downloading separate debug info for /usr/lib/libc.so.6
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".
Downloading separate debug info for /home/lg/.local/lib/venv/skimagedev/lib/python3.11/site-packages/numpy/core/_multiarray_umath.cpython-311-x86_64-linux-gnu.so
Downloading separate debug info for /home/lg/.local/lib/venv/skimagedev/lib/python3.11/site-packages/numpy/core/../../numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so
Downloading separate debug info for /home/lg/.local/lib/venv/skimagedev/lib/python3.11/site-packages/numpy/core/../../numpy.libs/libgfortran-040039e1.so.5.0.0
Downloading separate debug info for /home/lg/.local/lib/venv/skimagedev/lib/python3.11/site-packages/numpy/core/../../numpy.libs/libquadmath-96973f99.so.0.0.0
[New Thread 0x7ffff3bff6c0 (LWP 10872)]
[New Thread 0x7ffff33fe6c0 (LWP 10873)]
[New Thread 0x7ffff0bfd6c0 (LWP 10874)]
[New Thread 0x7fffec3fc6c0 (LWP 10875)]
[New Thread 0x7fffe9bfb6c0 (LWP 10876)]
[New Thread 0x7fffe73fa6c0 (LWP 10877)]
[New Thread 0x7fffe6bf96c0 (LWP 10878)]
Downloading separate debug info for /home/lg/.local/lib/venv/skimagedev/lib/python3.11/site-packages/numpy/core/_multiarray_tests.cpython-311-x86_64-linux-gnu.so
Downloading separate debug info for /home/lg/.local/lib/venv/skimagedev/lib/python3.11/site-packages/numpy/linalg/_umath_linalg.cpython-311-x86_64-linux-gnu.so
Downloading separate debug info for /home/lg/.local/lib/venv/skimagedev/lib/python3.11/site-packages/numpy/fft/_pocketfft_internal.cpython-311-x86_64-linux-gnu.so
Downloading separate debug info for /home/lg/.local/lib/venv/skimagedev/lib/python3.11/site-packages/numpy/random/mtrand.cpython-311-x86_64-linux-gnu.so
Downloading separate debug info for /home/lg/.local/lib/venv/skimagedev/lib/python3.11/site-packages/numpy/random/bit_generator.cpython-311-x86_64-linux-gnu.so
Downloading separate debug info for /home/lg/.local/lib/venv/skimagedev/lib/python3.11/site-packages/numpy/random/_common.cpython-311-x86_64-linux-gnu.so
Downloading separate debug info for /home/lg/.local/lib/venv/skimagedev/lib/python3.11/site-packages/numpy/random/_bounded_integers.cpython-311-x86_64-linux-gnu.so
Downloading separate debug info for /home/lg/.local/lib/venv/skimagedev/lib/python3.11/site-packages/numpy/random/_mt19937.cpython-311-x86_64-linux-gnu.so
Downloading separate debug info for /home/lg/.local/lib/venv/skimagedev/lib/python3.11/site-packages/numpy/random/_philox.cpython-311-x86_64-linux-gnu.so
Downloading separate debug info for /home/lg/.local/lib/venv/skimagedev/lib/python3.11/site-packages/numpy/random/_pcg64.cpython-311-x86_64-linux-gnu.so
Downloading separate debug info for /home/lg/.local/lib/venv/skimagedev/lib/python3.11/site-packages/numpy/random/_sfc64.cpython-311-x86_64-linux-gnu.so
Downloading separate debug info for /home/lg/.local/lib/venv/skimagedev/lib/python3.11/site-packages/numpy/random/_generator.cpython-311-x86_64-linux-gnu.so
[New Thread 0x7fffe170d6c0 (LWP 10879)]
[New Thread 0x7fffe0f0c6c0 (LWP 10880)]
free(): invalid pointer
free(): invalid pointer

Thread 9 "python" received signal SIGABRT, Aborted.
[Switching to Thread 0x7fffe170d6c0 (LWP 10879)]
__pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0)
at pthread_kill.c:44
Downloading source file /usr/src/debug/glibc/glibc/nptl/pthread_kill.c
44            return INTERNAL_SYSCALL_ERROR_P (ret) ? INTERNAL_SYSCALL_ERRNO (ret) : 0;
(gdb) bt
#0  __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6,
no_tid=no_tid@entry=0) at pthread_kill.c:44
#1  0x00007ffff748e8a3 in __pthread_kill_internal (signo=6, threadid=<optimized out>)
at pthread_kill.c:78
#2  0x00007ffff743e668 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#3  0x00007ffff74264b8 in __GI_abort () at abort.c:79
#4  0x00007ffff7427390 in __libc_message (fmt=fmt@entry=0x7ffff75a3550 "%s\n")
at ../sysdeps/posix/libc_fatal.c:150
#5  0x00007ffff74987b7 in malloc_printerr (str=str@entry=0x7ffff75a102b "free(): invalid pointer")
at malloc.c:5765
#6  0x00007ffff749aa74 in _int_free (av=<optimized out>, p=<optimized out>,
have_lock=have_lock@entry=0) at malloc.c:4500
#7  0x00007ffff749d353 in __GI___libc_free (mem=<optimized out>) at malloc.c:3391
#8  0x00007fffe201d0a9 in void eigh_wrapper<double>(char, char, char**, long const*, long const*) [clone .constprop.0] ()
from /home/lg/.local/lib/venv/skimagedev/lib/python3.11/site-packages/numpy/linalg/_umath_linalg.cpython-311-x86_64-linux-gnu.so
#9  0x00007ffff6a4cef5 in generic_wrapped_legacy_loop ()
from /home/lg/.local/lib/venv/skimagedev/lib/python3.11/site-packages/numpy/core/_multiarray_umath.cpython-311-x86_64-linux-gnu.so
#10 0x00007ffff6a5b9ce in ufunc_generic_fastcall ()
from /home/lg/.local/lib/venv/skimagedev/lib/python3.11/site-packages/numpy/core/_multiarray_umath.cpython-311-x86_64-linux-gnu.so
#11 0x00007ffff79f20d7 in _PyObject_VectorcallTstate (kwnames=<optimized out>,
nargsf=<optimized out>, args=<optimized out>, callable=0x7ffff3fb3840, tstate=0x555555ae04c0)
at ./Include/internal/pycore_call.h:92
#12 PyObject_Vectorcall (callable=0x7ffff3fb3840, args=<optimized out>, nargsf=<optimized out>,
kwnames=<optimized out>) at Objects/call.c:299
#13 0x00007ffff79e4379 in _PyEval_EvalFrameDefault (tstate=<optimized out>, frame=<optimized out>,
throwflag=<optimized out>) at Python/ceval.c:4773
#14 0x00007ffff7a0a9e0 in _PyEval_EvalFrame (throwflag=0, frame=0x7ffff76ce318, tstate=0x555555ae04c0)
at ./Include/internal/pycore_ceval.h:73
#15 _PyEval_Vector (kwnames=<optimized out>, argcount=1, args=0x7ffff76ce310, locals=0x0,
func=0x7fffe2369bc0, tstate=0x555555ae04c0) at Python/ceval.c:6438
#16 _PyFunction_Vectorcall (func=0x7fffe2369bc0, stack=0x7ffff76ce310, nargsf=<optimized out>,
kwnames=<optimized out>) at Objects/call.c:393
#17 0x00007ffff79f20d7 in _PyObject_VectorcallTstate (kwnames=<optimized out>,
nargsf=<optimized out>, args=<optimized out>, callable=0x7fffe2369bc0, tstate=0x555555ae04c0)
at ./Include/internal/pycore_call.h:92
#18 PyObject_Vectorcall (callable=0x7fffe2369bc0, args=<optimized out>, nargsf=<optimized out>,
kwnames=<optimized out>) at Objects/call.c:299
#19 0x00007ffff694407d in dispatcher_vectorcall ()
from /home/lg/.local/lib/venv/skimagedev/lib/python3.11/site-packages/numpy/core/_multiarray_umath.cpython-311-x86_64-linux-gnu.so
#20 0x00007ffff79f20d7 in _PyObject_VectorcallTstate (kwnames=<optimized out>,
nargsf=<optimized out>, args=<optimized out>, callable=0x7fffe2362db0, tstate=0x555555ae04c0)
--Type <RET> for more, q to quit, c to continue without paging--
at ./Include/internal/pycore_call.h:92
#21 PyObject_Vectorcall (callable=0x7fffe2362db0, args=<optimized out>, nargsf=<optimized out>,
kwnames=<optimized out>) at Objects/call.c:299
#22 0x00007ffff79e4379 in _PyEval_EvalFrameDefault (tstate=<optimized out>, frame=<optimized out>,
throwflag=<optimized out>) at Python/ceval.c:4773
#23 0x00007ffff7a0a9e0 in _PyEval_EvalFrame (throwflag=0, frame=0x7ffff76ce2b0, tstate=0x555555ae04c0)
at ./Include/internal/pycore_ceval.h:73
#24 _PyEval_Vector (kwnames=<optimized out>, argcount=1, args=0x7fffe1755498, locals=0x0,
func=0x7ffff77984a0, tstate=0x555555ae04c0) at Python/ceval.c:6438
#25 _PyFunction_Vectorcall (func=0x7ffff77984a0, stack=0x7fffe1755498, nargsf=<optimized out>,
kwnames=<optimized out>) at Objects/call.c:393
#26 0x00007ffff79e7e77 in do_call_core (use_tracing=<optimized out>, kwdict=0x7fffe17706c0,
callargs=0x7fffe1755480, func=0x7ffff77984a0, tstate=<optimized out>) at Python/ceval.c:7356
#27 _PyEval_EvalFrameDefault (tstate=<optimized out>, frame=<optimized out>,
throwflag=<optimized out>) at Python/ceval.c:5380
#28 0x00007ffff7a0a9e0 in _PyEval_EvalFrame (throwflag=0, frame=0x7ffff76ce188, tstate=0x555555ae04c0)
at ./Include/internal/pycore_ceval.h:73
#29 _PyEval_Vector (kwnames=<optimized out>, argcount=4, args=0x7ffff6f2f528, locals=0x0,
func=0x7fffe176d3a0, tstate=0x555555ae04c0) at Python/ceval.c:6438
#30 _PyFunction_Vectorcall (func=0x7fffe176d3a0, stack=0x7ffff6f2f528, nargsf=<optimized out>,
kwnames=<optimized out>) at Objects/call.c:393
#31 0x00007ffff79e7e77 in do_call_core (use_tracing=<optimized out>, kwdict=0x7ffff77f3b40,
callargs=0x7ffff6f2f510, func=0x7fffe176d3a0, tstate=<optimized out>) at Python/ceval.c:7356
#32 _PyEval_EvalFrameDefault (tstate=<optimized out>, frame=<optimized out>,
throwflag=<optimized out>) at Python/ceval.c:5380
#33 0x00007ffff7a2c403 in _PyEval_EvalFrame (throwflag=0, frame=0x7ffff76ce020, tstate=0x555555ae04c0)
at ./Include/internal/pycore_ceval.h:73
#34 _PyEval_Vector (kwnames=<optimized out>, argcount=<optimized out>, args=0x7fffe170ce28,
locals=0x0, func=0x7fffe1fe6d40, tstate=0x555555ae04c0) at Python/ceval.c:6438
#35 _PyFunction_Vectorcall (kwnames=<optimized out>, nargsf=<optimized out>, stack=0x7fffe170ce28,
func=0x7fffe1fe6d40) at Objects/call.c:393
#36 _PyObject_VectorcallTstate (tstate=0x555555ae04c0, callable=0x7fffe1fe6d40, args=0x7fffe170ce28,
nargsf=<optimized out>, kwnames=<optimized out>) at ./Include/internal/pycore_call.h:92
#37 0x00007ffff7a2c0c8 in method_vectorcall (method=<optimized out>,
args=0x7ffff7d6f6b0 <_PyRuntime+58928>, nargsf=<optimized out>, kwnames=0x0)
at Objects/classobject.c:67
#38 0x00007ffff7af4fe0 in thread_run (boot_raw=0x7fffe1756760) at ./Modules/_threadmodule.c:1092
#39 0x00007ffff7acad28 in pythread_wrapper (arg=<optimized out>) at Python/thread_pthread.h:241
#40 0x00007ffff748c9eb in start_thread (arg=<optimized out>) at pthread_create.c:444
#41 0x00007ffff7510dfc in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78
(gdb)

Context for the issue:

This is currently blocking us from upgrading our dependency on NumPy to 1.26.0b1 for scikit-image in scikit-image/scikit-image#7101. It's been a very tricky thing to debug and I am a bit out of my depth now. :)

@mattip
Copy link
Member
mattip commented Aug 23, 2023 8000

Thanks for whittling it down to a minimal reproducer. It sounds like something to do with OpenBLAS threading:

  • it disappears when you use 1 worker
  • you sometimes see the OpenBLAS error message "On entry to DSYEVD parameter number 8 had an illegal value"
  • The problem goes away when you use the older OpenBLAS 0.3.23, rather than OpenBLAS from commit c2f4bdbb used in the 1.26.0b1 wheel

Workarounds

Can you see how many threads are open when you use your worker pool? I think you might be getting 8 for each worker, and each of those threads allocates a working buffer.

I see OpenBLAS wants to use 8 threads on your machine. Could you control this with either threadpoolctl or via setting the OPENBLAS_NUM_THREADS environment variable?

Further analysis

@martin-frbg, do you know of anyhting that might have caused a regression between 0.3.23 and 0.3.23 + c2f4bdbb?

@martin-frbg
Copy link

Not immediately aware of anything that could have caused this (btw. I use the Milestone feature of gh to track non-trivial changes for the next release). Will a build from source automatically pull in 0.3.23 on whatever platform ? Incidentally, INFO=8 from DSYEVD means "your work array is too small"

@mattip
Copy link
Member
mattip commented Aug 23, 2023

Will a build from source automatically pull in 0.3.23 on whatever platform

A build from source will pull in whatever is on the platform via pkg-config. The wheel builds download and provision a specific version to be available via pkg-config before building.

@martin-frbg
Copy link

I can reproduce this, but only when I deliberately build libopenblas for a smaller NUM_THREADS than actually present in the target system. Are you building the "experimental" c2f4bdbb with the exact same parameters that the previously used OpenBLAS binary was built with, especially NUM_THREADS (which defaults to the number of cores in the build host) ?
Having provisioned for fewer cores/threads than available has always been a situation to avoid with OpenBLAS. I thought I had successfully solved that problem by having OpenBLAS allocate auxiliary data structures on the fly if needed, but it looks like there may be a race condition in the current code, or an undersized array not handled by the fix.
At least currently it looks like the problem is not caused by anything done after the 0.3.23 release, but I have not figured out where memory management goes wrong. (The test case always runs fine under valgrind)

@mattip
Copy link
Member
mattip commented Aug 24, 2023

Hmm. Nothing changed in the build scripts since 0.3.23. But calling openblas_get_config64_ on the so (on ubuntu) results in

OpenBLAS 0.3.23.dev  USE64BITINT DYNAMIC_ARCH NO_AFFINITY Zen MAX_THREADS=64

I see this line in the windows make call:

make BINARY=$build_bits DYNAMIC_ARCH=1 USE_THREAD=1 USE_OPENMP=0 \
     NUM_THREADS=24 NO_WARMUP=1 NO_AFFINITY=1 CONSISTENT_FPCSR=1 \
...

and this in the posix make call

CFLAGS="$CFLAGS -fvisibility=protected" \
make BUFFERSIZE=20 DYNAMIC_ARCH=1 USE_OPENMP=0 NUM_THREADS=64 \
    BINARY=$bitness $interface64_flags $target_flags > /dev/null

So it does seem we are setting NUM_THREADS in the build, perhaps we should boost the number for windows.

@lagru
Copy link
Contributor Author
lagru commented Aug 24, 2023

Thanks for the quick feedback!

I see OpenBLAS wants to use 8 threads on your machine. Could you control this with either threadpoolctl [...]

In the failing environment with pre-build NumPy 1.26.0b1

$ python -m threadpoolctl -i numpy
[
  {
    "user_api": "blas",
    "internal_api": "openblas",
    "num_threads": 8,
    "prefix": "libopenblas",
    "filepath": "/home/lg/.local/lib/venv/skimagedev/lib/python3.11/site-packages/numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so",
    "version": "0.3.23.dev",
    "threading_layer": "pthreads",
    "architecture": "Haswell"
  }
]

I get the same output in the environment with the passing self-build NumPy, except for the version field which is "0.3.23" instead of "0.3.23.dev".

@lagru
Copy link
Contributor Author
lagru commented Aug 24, 2023

Not sure if I am doing it wrong, but

OPENBLAS_NUM_THREADS=1 python local/debug-pr7101.py

seems to have no impact, regardless of the values 0, 1, 2, 4, 8, 16. The error persists.

@martin-frbg
Copy link

Hm, strange. And if numpy is really running OpenBLAS with just 8 threads instead of "all it can get", you should be safely below either of the two platform-specific compile time limits Matti mentioned. Maybe I just tried too hard to create a failing configuration of OpenBLAS and the actual problem is elsewhere ? (BTW it is still a bit unclear to me from the logs scattered across multiple issue tickets what your hardware and operating system is. I see Windows mentioned, but the quoted paths look unixoid ?)

@mattip
Copy link
Member
mattip commented Aug 24, 2023

Sorry, I may have muddied the waters by mentioning windows. The system under test is linux + python 3.11 as can be seen by opening the "Pre-build v1.26.0b1" details subsection above.

@lagru
Copy link
Contributor Author
lagru commented Aug 24, 2023

Yes.

import sys; print(sys.version)
import platform; print(platform.platform())
# 3.11.3 (main, Jun  5 2023, 09:32:32) [GCC 13.1.1 20230429]
# Linux-6.4.11-arch2-1-x86_64-with-glibc2.38

You can also see this in action on our CI in scikit-image/scikit-image#7101. It fails on linux-cp3.11-pre but on Windows "Default Python311-x64-pre" too.

@martin-frbg
Copy link

Thanks. With a bit of patience, the failures are also reproducible with the 0.3.23 release (and the build-time NUM_THREADS set to 24 on a 4-core hardware). So at least no recent regression in OpenBLAS, and it seems to me that some of the "double free"/"invalid pointer" messages are generated before OpenBLAS gets initialized - at least before a DYNAMIC_ARCH build announces (with OPENBLAS_VERBOSE=2) which cpu it has detected.

@martin-frbg
Copy link

gdb backtraces lead back to a free() in NumPy's eigh_wrapper from .../dist_packages/numpy/linalg/_umath_linalg.cpython-310-x86_64-linux-gnu.so for which I lack a debuggable version (this is an Ubuntu 22 VM with Python 3.10.6, numpy upgraded to 1.26b via pip)

@mattip
Copy link
Member
mattip commented Aug 24, 2023

That leads to here, which allocates some buffers, linearizes (copies) the input into the buffers, and then calls one of the call_evd which calls CHEEVD or ZHEEVD.

@mattip
Copy link
Member
mattip commented Aug 24, 2023

I seem to recall seeing On entry to F parameter number N had an illegal value when we had register corruption on windows. Maybe we are seeing something similar?

@martin-frbg
Copy link

possible, though right now I am not even sure that OpenBLAS' DSYEVD is ever reached. (Unless the python/numpy environment catches any write to stdout from Fortran code)

@charris charris added this to the 1.26.0 release milestone Aug 24, 2023
@rgommers
Copy link
Member

This may be a case where we should do the work of separating NumPy vs. OpenBLAS by comparing with Netlib and creating a pure C or Fortran reproducer for OpenBLAS if we do determine it's specific to OpenBLAS and not Netlib.

Cc @steppi who is working on streamlining that process as much as possible.

@steppi
Copy link
Contributor
steppi commented Aug 24, 2023

This may be a case where we should do the work of separating NumPy vs. OpenBLAS by comparing with Netlib and creating a pure C or Fortran reproducer for OpenBLAS if we do determine it's specific to OpenBLAS and not Netlib.

Cc @steppi who is working on streamlining that process as much as possible.

On it!

@martin-frbg
Copy link

Thanks. All I can say so far is that I see no evidence (neither from print statements added to the code nor from gdb breakpoints) that OpenBLAS' implementations of DSYEVD and XERBLA are ever entered in the sequence that leads to the LAPACK-like error message, and libopenblas does not feature in any gdb backtrace.

@martin-frbg
Copy link

... and I see the exact same (mis)behaviour when I replace NumPy's libopenblas with 0.3.21, or 0.3.15.

@steppi
Copy link
Contributor
steppi commented Aug 24, 2023

I'm seeing the same misbehavior with netlib reference BLAS as well. I'm using FlexiBLAS to swap out BLAS versions, and was able to replicate by building numpy from the branch maintenance/1.26.x.

@steppi
Copy link
Contributor
steppi commented Aug 29, 2023

I've identified that numpy is calling to the non-threadsafe lapack_lite rather than the lapack seen in np.__show__.config(). This seems to only occur with the meson build, which is why @lagru couldn't reproduce when building with

python setup.py build_ext --inplace -j 4

I'm building with the following (one needs to set up flexiblas for this to work), and have reproduced on main.

spin build --clean -- -Dblas=flexiblas -Dlapack=flexiblas

Below are some details of what I observed during the debugging process:

Within umath_linalg.cpp in eigh_wrapper and the initialization code it calls init_evd, I added print statements which include the thread id before and after each of the steps.

  1. Call init_evd
  2. Query for optimal work array size
  3. Initialize
  4. Do work
  5. Free

Of the three behaviors seen, things work correctly when one thread completes all of its work before the other. One sees the DYSEVD invalid parameter error when the second thread queries for the optimal work array size while the first is doing work. One sees free(): invalid pointer when the first thread frees after the second thread calls init_evd but before it queries for the optimal work array size.

That this lack of thread safety is seen when numpy claims to be using either OpenBLAS or NETLIB BLAS, who's DYSEVD implementations are thread safe and battle tested is a pretty strong clue. After doing some digging, I found that lapack_lite is not thread safe and had a hunch it is getting called here for some reason. I put a print statement in the relevant part of the lapack_lite source and surely enough it's output was printed when I ran the reproducer.

I'll keep looking into this to see what could be going wrong in the meson build.

@steppi
Copy link
Contributor
steppi commented Aug 29, 2023

@rgommers found the issue in https://github.com/numpy/numpy/blob/main/numpy/linalg/meson.build. The _umath_linalg extension is missing blas and lapack in its sources. It's a simple fix.

rgommers added a commit to rgommers/numpy that referenced this issue Aug 30, 2023
Closes numpygh-24512, where `linalg.eigvalsh` was observed to be non-thread
safe. This was due to the non-thread safe `lapack_lite` being called
instead of the installed BLAS/LAPACK.

Co-authored-by: Albert Steppi <albert.steppi@gmail.com>
rgommers added a commit that referenced this issue Aug 30, 2023
Closes gh-24512, where `linalg.eigvalsh` was observed to be non-thread
safe. This was due to the non-thread safe `lapack_lite` being called
instead of the installed BLAS/LAPACK.

Co-authored-by: Ralf Gommers <ralf.gommers@gmail.com>
charris pushed a commit to charris/numpy that referenced this issue Aug 30, 2023
Closes numpygh-24512, where `linalg.eigvalsh` was observed to be non-thread
safe. This was due to the non-thread safe `lapack_lite` being called
instead of the installed BLAS/LAPACK.

Co-authored-by: Ralf Gommers <ralf.gommers@gmail.com>
@lagru
Copy link
Contributor Author
lagru commented Aug 31, 2023

Thanks to everyone for tackling this so quickly! 👍

charris pushed a commit to charris/numpy that referenced this issue Nov 11, 2023
Closes numpygh-24512, where `linalg.eigvalsh` was observed to be non-thread
safe. This was due to the non-thread safe `lapack_lite` being called
instead of the installed BLAS/LAPACK.

Co-authored-by: Ralf Gommers <ralf.gommers@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants
0