10000 [BUG]: Unresolved symbols or segfaults when importing tensorflow (and numpy) · Issue #3543 · pybind/pybind11 · GitHub
[go: up one dir, main page]

Skip to content
[BUG]: Unresolved symbols or segfaults when importing tensorflow (and numpy) #3543
@titardrew

Description

@titardrew

Required prerequisites

Problem description

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux Ubuntu 18.04.5
  • TensorFlow/NumPy installed from (source or binary): PyPi
  • Python version: 3.6.9
  • Virtual environment is used
  • No GPU/CUDA
  • Pybind11 version:
#define PYBIND11_VERSION_MAJOR 2
#define PYBIND11_VERSION_MINOR 9
#define PYBIND11_VERSION_PATCH 0.dev1
                                                                                                                                                                                  // Similar to Python's convention: https://docs.python.org/3/c-api/apiabiversion.html
// Additional convention: 0xD = dev
#define PYBIND11_VERSION_HEX 0x020900D1

The issue
I am trying to use tensorflow in embedded Python with pybind11.

import tensorflow

However, it fails with the following message:

terminate called after throwing an instance of 'pybind11::error_already_set'
  what():  ImportError: Traceback (most recent call last):
  File "/home/user/project/venv/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow.py", line 64, in <module>
    from tensorflow.python._pywrap_tensorflow_internal import *
ImportError: /home/user/project/venv/lib/python3.6/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so: undefined symbol: _ZN10tensorflow2io20InputStreamInterface10SkipNBytesEx


Failed to load the native TensorFlow runtime.

See https://www.tensorflow.org/install/errors

for some common reasons and solutions.  Include the entire stack trace
above this error message when asking for help.

At:
  /home/user/project/venv/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow.py(83): <module>
  <frozen importlib._bootstrap>(219): _call_with_frames_removed
  <frozen importlib._bootstrap_external>(678): exec_module
  <frozen importlib._bootstrap>(665): _load_unlocked
  <frozen importlib._bootstrap>(955): _find_and_load_unlocked
  <frozen importlib._bootstrap>(971): _find_and_load
  <frozen importlib._bootstrap>(219): _call_with_frames_removed
  <frozen importlib._bootstrap>(1031): _handle_fromlist
  /home/user/project/venv/lib/python3.6/site-packages/tensorflow/python/__init__.py(39): <module>
  <frozen importlib._bootstrap>(219): _call_with_frames_removed
  <frozen importlib._bootstrap_external>(678): exec_module
  <frozen importlib._bootstrap>(665): _load_unlocked
  <frozen importlib._bootstrap>(955): _find_and_load_unlocked
  <frozen importlib._bootstrap>(971): _find_and_load
  <frozen importlib._bootstrap>(219): _call_with_frames_removed
  <frozen importlib._bootstrap>(941): _find_and_load_unlocked
  <frozen importlib._bootstrap>(971): _find_and_load
  /home/user/project/venv/lib/python3.6/site-packages/tensorflow/__init__.py(41): <module>
  <frozen importlib._bootstrap>(219): _call_with_frames_removed
  <frozen importlib._bootstrap_external>(678): exec_module
  <frozen importlib._bootstrap>(665): _load_unlocked
  <frozen importlib._bootstrap>(955): _find_and_load_unlocked
  <frozen importlib._bootstrap>(971): _find_and_load
  /home/user/project/python/tf_import_script.py(4): <module>
  <frozen importlib._bootstrap>(219): _call_with_frames_removed
  <frozen importlib._bootstrap_external>(678): exec_module
  <frozen importlib._bootstrap>(665): _load_unlocked
  <frozen importlib._bootstrap>(955): _find_and_load_unlocked
  <frozen importlib._bootstrap>(971): _find_and_load

Aborted

The weird part is that the problem does not occur in REPL or when the Python interpreter is invoked directly.

When I use a different Python version (3.8/3.9) a segfault appears even when I import NumPy(1.19.5, but it does not seem to matter). Here's the call stack (gdb):

...
[New Thread 0x7fff03878700 (LWP 11957)]
[New Thread 0x7ffefb077700 (LWP 11958)]
[New Thread 0x7ffefa876700 (LWP 11959)]

Thread 1 "aq_runner" received signal SIGSEGV, Segmentation fault.
__strlen_avx2 () at ../sysdeps/x86_64/multiarch/strlen-avx2.S:65
65      ../sysdeps/x86_64/multiarch/strlen-avx2.S: No such file or directory.
(gdb) up
#1  0x00007fffe72a66d9 in PyUnicode_FromString () from /usr/lib/x86_64-linux-gnu/libpython3.9.so.1.0
(gdb)
#2  0x00007ffee7fda5f5 in ?? () from /usr/lib/python3.9/lib-dynload/_hashlib.cpython-39-x86_64-linux-gnu.so
(gdb)
#3  0x00007ffee7c824ac in OPENSSL_LH_doall_arg () from /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
(gdb)
#4  0x00007ffee7c8e287 in OBJ_NAME_do_all () from /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
(gdb)
#5  0x00007ffee7c79015 in EVP_MD_do_all () from /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
(gdb)
#6  0x00007ffee7fdfa1a in PyInit__hashlib () from /usr/lib/python3.9/lib-dynload/_hashlib.cpython-39-x86_64-linux-gnu.so
(gdb)
#7  0x00007fffe737cebc in ?? () from /usr/lib/x86_64-linux-gnu/libpython3.9.so.1.0
(gdb)
#8  0x00007fffe737cc3d in ?? () from /usr/lib/x86_64-linux-gnu/libpython3.9.so.1.0
(gdb)
#9  0x00007fffe728fbda in ?? () from /usr/lib/x86_64-linux-gnu/libpython3.9.so.1.0
(gdb)
#10 0x00007fffe72621ab in PyVectorcall_Call () from /usr/lib/x86_64-linux-gnu/libpython3.9.so.1.0
(gdb)
...
#208 0x00007fffe7260a5d in _PyObject_MakeTpCall () from /usr/lib/x86_64-linux-gnu/libpython3.9.so.1.0
(gdb)
#209 0x00007fffe726267a in PyObject_CallFunction () from /usr/lib/x86_64-linux-gnu/libpython3.9.so.1.0
(gdb)
#210 0x00007fffe72e71fa in PyImport_Import () from /usr/lib/x86_64-linux-gnu/libpython3.9.so.1.0
(gdb)
#211 0x00007fffe737c15d in PyImport_ImportModule () from /usr/lib/x86_64-linux-gnu/libpython3.9.so.1.0
(gdb)
#212 0x000055555556fd9d in pybind11::module_::import (name=0x5555556535d9 "numpy") at /home/user/project/third_party/pybind11/include/pybind11/pybind11.h:1063
1063            PyObject *obj = PyImport_ImportModule(name);
(gdb)
#213 0x000055555560e9e9 in aq::PythonInterpreter::Init (this=0x5555558abde8 <aq::g__PythonInterpreter>) at /home/user/project/src/common/python.cc:24
24              py::module::import("numpy");

This issue seems to be caused by some sort of a heisenbug because I did not have it for a while and reverting my codebase to a previously stable commit did not help. I tried to reinstall Python and the entire environment a couple of times - did not help either.
I think the root cause might be related to my environment and/or Python interpreter distribution. But I don't have a clear idea of where I should start.

Reproducible example code

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    triageNew bug, unverified

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0