8000 BUG: Python (debug mode+free threading) segfaults at exit since Numpy 2.2.0 · Issue #27953 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

BUG: Python (debug mode+free threading) segfaults at exit since Numpy 2.2.0 #27953

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
cavokz opened this issue Dec 9, 2024 · 18 comments · Fixed by #27955
Closed

BUG: Python (debug mode+free threading) segfaults at exit since Numpy 2.2.0 #27953

cavokz opened this issue Dec 9, 2024 · 18 comments · Fixed by #27955
Labels
00 - Bug 39 - free-threading PRs and issues related to support for free-threading CPython (a.k.a. no-GIL, PEP 703)
Milestone

Comments

@cavokz
Copy link
cavokz commented Dec 9, 2024

Describe the issue:

Numpy 2.1.3 works fine but with 2.2.0 the interpreter segfaults at exit.

Both Python 3.13.0t and 3.13.1t segfault at exit, both 3.13.0 and 3.13.1 do not.

I can reproduce this on macOS (amd64), Debian 11 (amd64) and Debian 12 (arm64). However also other combinations might be affected.

All the interpreters are built in debug mode with pyenv, ex. pyenv install -g 3.13.1t. Did not check if non-debug builds are affected as well. Python 3.13.1t non-debug (pyenv install 3.13.1t), at least on macOS, does not exhibit the segfault.

The discriminating Python build options seem to be:

  1. debug mode
  2. free threading

Reproduce the code example:

import numpy

Error message (from gdb on Debian 12 arm64):

Thread 1 "python3" received signal SIGSEGV, Segmentation fault.
__GI___libc_free (mem=<optimized out>) at ./malloc/malloc.c:3362
3362	./malloc/malloc.c: No such file or directory.
(gdb) bt
#0  __GI___libc_free (mem=<optimized out>) at ./malloc/malloc.c:3362
#1  0x0000fffff447e5ac in ufunc_dealloc ()
   from /home/cavok/.pyenv/versions/3.13.1t-debug/lib/python3.13t/site-packages/numpy/_core/_multiarray_umath.cpython-313t-aarch64-linux-gnu.so
#2  0x0000fffff799daf8 in _Py_Dealloc (op=0x2000f4d1a50) at Objects/object.c:2918
#3  0x0000fffff7999794 in _Py_MergeZeroLocalRefcount (op=0x2000f4d1a50) at Objects/object.c:423
#4  0x0000fffff7972ea0 in Py_DECREF (filename=0xfffff7c918f8 "./Include/object.h", lineno=1042, op=0x2000f4d1a50) at ./Include/object.h:892
#5  0x0000fffff7972f1c in Py_XDECREF (op=0x2000f4d1a50) at ./Include/object.h:1042
#6  0x0000fffff79753dc in dictkeys_decref (interp=0xfffff7f10280 <_PyRuntime+128704>, dk=0x2000f858c10, use_qsbr=false) at Objects/dictobject.c:496
#7  0x0000fffff797c284 in dict_dealloc (self=0x2000f78fcd0) at Objects/dictobject.c:3179
#8  0x0000fffff799daf8 in _Py_Dealloc (op=0x2000f78fcd0) at Objects/object.c:2918
#9  0x0000fffff7999794 in _Py_MergeZeroLocalRefcount (op=0x2000f78fcd0) at Objects/object.c:423
#10 0x0000fffff7b38344 in Py_DECREF (filename=0xfffff7d023d8 "./Include/object.h", lineno=1042, op=0x2000f78fcd0) at ./Include/object.h:892
#11 0x0000fffff7b383c0 in Py_XDECREF (op=0x2000f78fcd0) at ./Include/object.h:1042
#12 0x0000fffff7b3adb8 in del_cached_m_dict (value=0xaaaaaab10780) at Python/import.c:1119
#13 0x0000fffff7b3aeac in del_extensions_cache_value (value=0xaaaaaab10780) at Python/import.c:1149
#14 0x0000fffff7b37d00 in _Py_hashtable_destroy_entry (ht=0xaaaaaaad7e50, entry=0xaaaaaab107e0) at Python/hashtable.c:385
#15 0x0000fffff7b37e1c in _Py_hashtable_destroy (ht=0xaaaaaaad7e50) at Python/hashtable.c:417
#16 0x0000fffff7b3b7f0 in _extensions_cache_clear_all () at Python/import.c:1452
#17 0x0000fffff7b40d98 in _PyImport_Fini () at Python/import.c:4014
#18 0x0000fffff7b754f8 in _Py_Finalize (runtime=0xfffff7ef0bc0 <_PyRuntime>) at Python/pylifecycle.c:2129
#19 0x0000fffff7b7557c in Py_FinalizeEx () at Python/pylifecycle.c:2215
#20 0x0000fffff7bb98a0 in Py_RunMain () at Modules/main.c:777
#21 0x0000fffff7bb9938 in pymain_main (args=0xfffffffff2c0) at Modules/main.c:805
#22 0x0000fffff7bb99ac in Py_BytesMain (argc=6, argv=0xfffffffff478) at Modules/main.c:829
#23 0x0000aaaaaaaa0970 in main (argc=6, argv=0xfffffffff478) at ./Programs/python.c:15

Python and NumPy Versions:

3.13.1 experimental free-threading build (main, Dec 9 2024, 11:53:27) [Clang 16.0.0 (clang-1600.0.26.4)]
2.2.0

Context for the issue:

This affects the Pygolo Project CI pipeline (logs) which has a minimal interoperability extension test based on Numpy. I will pin the release of Numpy so to use 2.1.3 but I'll be happy to revert that as soon as this bug is fixed.

@cavokz cavokz added the 00 - Bug label Dec 9, 2024
@cavokz cavokz changed the title BUG: Python (with free threading) segfault at exit since Numpy 2.2.0 BUG: Python (free threading) segfaults at exit since Numpy 2.2.0 Dec 9, 2024
@cavokz cavokz changed the title BUG: Python (free threading) segfaults at exit since Numpy 2.2.0 BUG: Python (debug mode+free threading) segfaults at exit since Numpy 2.2.0 Dec 9, 2024
@ngoldbaum
Copy link
Member

I'm not able to reproduce this on a fresh build debug build of python 3.13.1t and numpy built from source from main or with the numpy 2.2.0 wheel on pypi on my ARM macbook Pro:

goldbaum at Nathans-MBP in ~/Documents/numpy on wheel-build-timeouts
± python -m pip install -v . --no-build-isolation -Cbuilddir=build -C'compile-args=-v' -C'setup-args=-Dbuildtype=debug'
(lots of output elided)
goldbaum at Nathans-MBP in ~/Documents
○  python
imPython 3.13.1 experimental free-threading build (main, Dec  9 2024, 10:24:44) [Clang 16.0.0 (clang-1600.0.26.4)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy as np
>>> import sys; print(sys.version_info)
sys.version_info(major=3, minor=13, micro=1, releaselevel='final', serial=0)
>>> import sysconfig
>>> sysconfig.get_config_var("CFLAGS")
'-fno-strict-overflow -Wsign-compare -g -Og -Wall -I/opt/homebrew/opt/zlib  -O0 -I/opt/homebrew/opt/zlib'
>>> sys.gettotalrefcount()
105532
>>> quit()

goldbaum at Nathans-MBP in ~/Documents
○  pip uninstall numpy
Found existing installation: numpy 2.3.0.dev0
Uninstalling numpy-2.3.0.dev0:
  Would remove:
    /Users/goldbaum/.pyenv/versions/3.13.1t-debug/bin/f2py
    /Users/goldbaum/.pyenv/versions/3.13.1t-debug/bin/numpy-config
    /Users/goldbaum/.pyenv/versions/3.13.1t-debug/lib/python3.13t/site-packages/numpy-2.3.0.dev0.dist-info/*
    /Users/goldbaum/.pyenv/versions/3.13.1t-debug/lib/python3.13t/site-packages/numpy/*
Proceed (Y/n)? y
  Successfully uninstalled numpy-2.3.0.dev0

goldbaum at Nathans-MBP in ~/Documents
○  pip install numpy
Collecting numpy
  Downloading numpy-2.2.0-cp313-cp313t-macosx_14_0_arm64.whl.metadata (62 kB)
Downloading numpy-2.2.0-cp313-cp313t-macosx_14_0_arm64.whl (5.2 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 5.2/5.2 MB 18.4 MB/s eta 0:00:00
/Users/goldbaum/.pyenv/versions/3.13.1t-debug/lib/python3.13t/site-packages/pip/_internal/metadata/importlib/_dists.py:77: DeprecationWarning: Unimplemented abstract methods {'locate_file'}
  return cls(files, info_location)
Installing collected packages: numpy
Successfully installed numpy-2.2.0

goldbaum at Nathans-MBP in ~/Documents
○  python
Python 3.13.1 experimental free-threading build (main, Dec  9 2024, 10:24:44) [Clang 16.0.0 (clang-1600.0.26.4)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
im>>> import numpy as np
>>>

Trying on Linux next...

@cavokz
Copy link
Author
cavokz commented Dec 9, 2024

I later updated the issue, it seems you also need the debug mode enabled.

@ngoldbaum
Copy link
Member

I built Python with debug mode.

@ngoldbaum
Copy link
Member

It would help me if you could give more information about how you're installing numpy after building python.

@ngoldbaum ngoldbaum added the 39 - free-threading PRs and issues related to support for free-threading CPython (a.k.a. no-GIL, PEP 703) label Dec 9, 2024
@ngoldbaum
Copy link
Member

I can't reproduce this on an amd64 Ubuntu 22.04 system either, using either a from-source numpy build or the 2.2.0 wheel.

@cavokz
Copy link
Author
cavokz commented Dec 9, 2024

2.2.0:

$ pip install numpy
Collecting numpy
  Using cached numpy-2.2.0-cp313-cp313t-macosx_14_0_x86_64.whl.metadata (62 kB)
Using cached numpy-2.2.0-cp313-cp313t-macosx_14_0_x86_64.whl (6.7 MB)
/Users/cavok/.pyenv/versions/3.13.1t-debug/lib/python3.13t/site-packages/pip/_internal/metadata/importlib/_dists.py:77: DeprecationWarning: Unimplemented abstract methods {'locate_file'}
  return cls(files, info_location)
Installing collected packages: numpy
Successfully installed numpy-2.2.0
$ python3
Python 3.13.1 experimental free-threading build (main, Dec  9 2024, 11:53:27) [Clang 16.0.0 (clang-1600.0.26.4)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy
>>> quit()
zsh: segmentation fault  python

2.1.3:

$ pip install numpy==2.1.3
Collecting numpy==2.1.3
  Using cached numpy-2.1.3-cp313-cp313t-macosx_14_0_x86_64.whl.metadata (62 kB)
Using cached numpy-2.1.3-cp313-cp313t-macosx_14_0_x86_64.whl (6.6 MB)
/Users/cavok/.pyenv/versions/3.13.1t-debug/lib/python3.13t/site-packages/pip/_internal/metadata/importlib/_dists.py:77: DeprecationWarning: Unimplemented abstract methods {'locate_file'}
  return cls(files, info_location)
Installing collected packages: numpy
  Attempting uninstall: numpy
    Found existing installation: numpy 2.2.0
    Uninstalling numpy-2.2.0:
      Successfully uninstalled numpy-2.2.0
Successfully installed numpy-2.1.3
$ python3
Python 3.13.1 experimental free-threading build (main, Dec  9 2024, 11:53:27) [Clang 16.0.0 (clang-1600.0.26.4)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy
>>> quit()

@ngoldbaum
Copy link
Member

Oh interesting, it looks like I do get a seg fault when Python exits:

goldbaum at Nathans-MBP in ~/Documents
○  python
Python 3.13.1 experimental free-threading build (main, Dec  9 2024, 10:24:44) [Clang 16.0.0 (clang-1600.0.26.4)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> # this function is only available on debug builds
>>> sys.gettotalrefcount()
54380
>>> import numpy as np
>>> np.__version__
'2.2.0'
>>> quit()
[1]    89726 segmentation fault  python

@cavokz
Copy link
Author
cavokz commented Dec 9, 2024

What changed on your side, np.__version__? I don't need that to get the segfault, which BTW is 100% reproducible for me.

@cavokz
Copy link
Author
cavokz commented Dec 9, 2024

>>> # this function is only available on debug builds

Thanks for this tip!

@ngoldbaum
Copy link
Member

What changed on your side, np.version? I don't need that to get the segfault, which BTW is 100% reproducible for me.

I was trying to prove that I don't hit the segfault on my machine, and I've only hit it doing exactly what I copy/pasted.

@cavokz
8000 Copy link
Author
cavokz commented Dec 9, 2024

What changed on your side, np.version? I don't need that to get the segfault, which BTW is 100% reproducible for me.

I was trying to prove that I don't hit the segfault on my machine, and I've only hit it doing exactly what I copy/pasted.

Is it 100% reproducible or just once in a while?

@ngoldbaum
Copy link
Member
ngoldbaum commented Dec 9, 2024

I was able to trigger it on NumPy 2.2.0 but not on NumPy main, so I suspect a broken backport is the cause. Ping @charris.

Here's the traceback in a debug build of NumPy 2.2.0:

    frame #0: 0x0000000183b6411c libsystem_pthread.dylib`pthread_cond_destroy + 32
    frame #1: 0x0000000183a8bcc4 libc++.1.dylib`std::__1::condition_variable::~condition_variable() + 24
    frame #2: 0x0000000101870850 _multiarray_umath.cpython-313td-darwin.so`::PyArrayIdentityHash_Dealloc(PyArrayIdentityHash *) [inlined] std::__1::__shared_mutex_base::~__shared_mutex_base[abi:sn180100](this=0xdddddddddddddddd) at shared_mutex:167:56 [opt]
  * frame #3: 0x0000000101870848 _multiarray_umath.cpython-313td-darwin.so`::PyArrayIdentityHash_Dealloc(PyArrayIdentityHash *) [inlined] std::__1::__shared_mutex_base::~__shared_mutex_base[abi:sn180100](this=0xdddddddddddddddd) at shared_mutex:167:56 [opt]
    frame #4: 0x0000000101870848 _multiarray_umath.cpython-313td-darwin.so`::PyArrayIdentityHash_Dealloc(PyArrayIdentityHash *) [inlined] std::__1::shared_mutex::~shared_mutex[abi:sn180100](this=0xdddddddddddddddd) at shared_mutex:192:49 [opt]
    frame #5: 0x0000000101870848 _multiarray_umath.cpython-313td-darwin.so`::PyArrayIdentityHash_Dealloc(PyArrayIdentityHash *) [inlined] std::__1::shared_mutex::~shared_mutex[abi:sn180100](this=0xdddddddddddddddd) at shared_mutex:192:49 [opt]
    frame #6: 0x0000000101870848 _multiarray_umath.cpython-313td-darwin.so`PyArrayIdentityHash_Dealloc(tb=<unavailable>) at npy_hashtable.cpp:131:5 [opt]
    frame #7: 0x0000000101884f04 _multiarray_umath.cpython-313td-darwin.so`ufunc_dealloc(ufunc=0x0000020000a30650) at ufunc_object.c:5196:9 [opt]
    frame #8: 0x0000000100f09408 libpython3.13td.dylib`_Py_Dealloc(op='0x16fdfe7e8') at object.c:2918:5

@cavokz
Copy link
Author
cavokz commented Dec 9, 2024

Is there anything I can do to help? I'll later try with a numpy built from source.

@ngoldbaum
Copy link
Member
ngoldbaum commented Dec 9, 2024

No I think this should be sorted out quickly and we'll do a 2.2.1 release with a fix.

@cavokz
Copy link
Author
cavokz commented Dec 9, 2024

Thank you for the lightning support!

@ngoldbaum
Copy link
Member

I edited some of my comments above, there is a dispatching.cpp in the 2.2.0 release, that's not the issue...

@ngoldbaum
Copy link
Member

ping @seberg do you have any idea why we'd seg fault in the destructor of a std::shared_mutex, but only under the debug python build?

@ngoldbaum
Copy link
Member

Oh I see, this is a very dumb use-after-free error:

PyMem_Free(tb);
#ifdef Py_GIL_DISABLED
delete (std::shared_mutex *)tb->mutex;
#endif

Those two operations should be switched. Oops!

PR incoming...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
00 - Bug 39 - free-threading PRs and issues related to support for free-threading CPython (a.k.a. no-GIL, PEP 703)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants
0