8000 [Python] RuntimeError when using pyarrow from a thread which is not joined before main exits when pandas is installed · Issue #35237 · apache/arrow · GitHub
[go: up one dir, main page]

Skip to content
[Python] RuntimeError when using pyarrow from a thread which is not joined before main exits when pandas is installed #35237
@jleibs

Description

@jleibs

Describe the bug, including details regarding any error messages, version, and platform.

This is a relatively straightforward problem in which a thread that is continuing to run during shutdown tries to register an atexit handler.

This only happens if the pandas library is installed causing the associated shims to be used. This happens regardless of whether or not pandas is in-use by the application.

The problem can be avoided by making sure to join all theads before main exits, but this is not generally required by python so should be considered a bug.

Context to reproduce:

requirements.txt

pandas==2.0.0
pyarrow==11.0.0

main.py

import threading
import pyarrow


def use_pyarrow() -> None:
    table = pyarrow.table({"a": [1, 2, 3]})


def main() -> None:
    t = threading.Thread(target=use_pyarrow, args=())
    t.start()

if __name__ == "__main__":
    main()

Run:

$ python main.py 
Traceback (most recent call last):
  File "pyarrow/pandas-shim.pxi", line 100, in pyarrow.lib._PandasAPIShim._check_import
  File "pyarrow/pandas-shim.pxi", line 48, in pyarrow.lib._PandasAPIShim._import_pandas
  File "/home/jleibs/pyarrow-repro/venv/lib/python3.10/site-packages/pyarrow/pandas_compat.py", line 24, in <module>
    import concurrent.futures.thread  # noqa
  File "/usr/lib/python3.10/concurrent/futures/thread.py", line 37, in <module>
    threading._register_atexit(_python_exit)
  File "/usr/lib/python3.10/threading.py", line 1504, in _register_atexit
    raise RuntimeError("can't register atexit after shutdown")
RuntimeError: can't register atexit after shutdown
Exception ignored in: 'pyarrow.lib._PandasAPIShim._have_pandas_internal'
Traceback (most recent call last):
  File "pyarrow/pandas-shim.pxi", line 100, in pyarrow.lib._PandasAPIShim._check_import
  File "pyarrow/pandas-shim.pxi", line 48, in pyarrow.lib._PandasAPIShim._import_pandas
  File "/home/jleibs/pyarrow-repro/venv/lib/python3.10/site-packages/pyarrow/pandas_compat.py", line 24, in <module>
    import concurrent.futures.thread  # noqa
  File "/usr/lib/python3.10/concurrent/futures/thread.py", line 37, in <module>
    threading._register_atexit(_python_exit)
  File "/usr/lib/python3.10/threading.py", line 1504, in _register_atexit
    raise RuntimeError("can't register atexit after shutdown")
RuntimeError: can't register atexit after shutdown

Component(s)

Python

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0