-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Description
Required prerequisites
- Make sure you've read the documentation. Your issue may be addressed there.
- Search the issue tracker and Discussions to verify that this hasn't already been reported. +1 or comment there if it has.
- Consider asking first in the Gitter chat room or in a Discussion.
Problem description
System information
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux Ubuntu 18.04.5
- TensorFlow/NumPy installed from (source or binary): PyPi
- Python version: 3.6.9
- Virtual environment is used
- No GPU/CUDA
- Pybind11 version:
#define PYBIND11_VERSION_MAJOR 2
#define PYBIND11_VERSION_MINOR 9
#define PYBIND11_VERSION_PATCH 0.dev1
// Similar to Python's convention: https://docs.python.org/3/c-api/apiabiversion.html
// Additional convention: 0xD = dev
#define PYBIND11_VERSION_HEX 0x020900D1
The issue
I am trying to use tensorflow
in embedded Python
with pybind11
.
import tensorflow
However, it fails with the following message:
terminate called after throwing an instance of 'pybind11::error_already_set'
what(): ImportError: Traceback (most recent call last):
File "/home/user/project/venv/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow.py", line 64, in <module>
from tensorflow.python._pywrap_tensorflow_internal import *
ImportError: /home/user/project/venv/lib/python3.6/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so: undefined symbol: _ZN10tensorflow2io20InputStreamInterface10SkipNBytesEx
Failed to load the native TensorFlow runtime.
See https://www.tensorflow.org/install/errors
for some common reasons and solutions. Include the entire stack trace
above this error message when asking for help.
At:
/home/user/project/venv/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow.py(83): <module>
<frozen importlib._bootstrap>(219): _call_with_frames_removed
<frozen importlib._bootstrap_external>(678): exec_module
<frozen importlib._bootstrap>(665): _load_unlocked
<frozen importlib._bootstrap>(955): _find_and_load_unlocked
<frozen importlib._bootstrap>(971): _find_and_load
<frozen importlib._bootstrap>(219): _call_with_frames_removed
<frozen importlib._bootstrap>(1031): _handle_fromlist
/home/user/project/venv/lib/python3.6/site-packages/tensorflow/python/__init__.py(39): <module>
<frozen importlib._bootstrap>(219): _call_with_frames_removed
<frozen importlib._bootstrap_external>(678): exec_module
<frozen importlib._bootstrap>(665): _load_unlocked
<frozen importlib._bootstrap>(955): _find_and_load_unlocked
<frozen importlib._bootstrap>(971): _find_and_load
<frozen importlib._bootstrap>(219): _call_with_frames_removed
<frozen importlib._bootstrap>(941): _find_and_load_unlocked
<frozen importlib._bootstrap>(971): _find_and_load
/home/user/project/venv/lib/python3.6/site-packages/tensorflow/__init__.py(41): <module>
<frozen importlib._bootstrap>(219): _call_with_frames_removed
<frozen importlib._bootstrap_external>(678): exec_module
<frozen importlib._bootstrap>(665): _load_unlocked
<frozen importlib._bootstrap>(955): _find_and_load_unlocked
<frozen importlib._bootstrap>(971): _find_and_load
/home/user/project/python/tf_import_script.py(4): <module>
<frozen importlib._bootstrap>(219): _call_with_frames_removed
<frozen importlib._bootstrap_external>(678): exec_module
<frozen importlib._bootstrap>(665): _load_unlocked
<frozen importlib._bootstrap>(955): _find_and_load_unlocked
<frozen importlib._bootstrap>(971): _find_and_load
Aborted
The weird part is that the problem does not occur in REPL
or when the Python
interpreter is invoked directly.
When I use a different Python version (3.8/3.9) a segfault appears even when I import NumPy(1.19.5, but it does not seem to matter). Here's the call stack (gdb):
...
[New Thread 0x7fff03878700 (LWP 11957)]
[New Thread 0x7ffefb077700 (LWP 11958)]
[New Thread 0x7ffefa876700 (LWP 11959)]
Thread 1 "aq_runner" received signal SIGSEGV, Segmentation fault.
__strlen_avx2 () at ../sysdeps/x86_64/multiarch/strlen-avx2.S:65
65 ../sysdeps/x86_64/multiarch/strlen-avx2.S: No such file or directory.
(gdb) up
#1 0x00007fffe72a66d9 in PyUnicode_FromString () from /usr/lib/x86_64-linux-gnu/libpython3.9.so.1.0
(gdb)
#2 0x00007ffee7fda5f5 in ?? () from /usr/lib/python3.9/lib-dynload/_hashlib.cpython-39-x86_64-linux-gnu.so
(gdb)
#3 0x00007ffee7c824ac in OPENSSL_LH_doall_arg () from /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
(gdb)
#4 0x00007ffee7c8e287 in OBJ_NAME_do_all () from /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
(gdb)
#5 0x00007ffee7c79015 in EVP_MD_do_all () from /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
(gdb)
#6 0x00007ffee7fdfa1a in PyInit__hashlib () from /usr/lib/python3.9/lib-dynload/_hashlib.cpython-39-x86_64-linux-gnu.so
(gdb)
#7 0x00007fffe737cebc in ?? () from /usr/lib/x86_64-linux-gnu/libpython3.9.so.1.0
(gdb)
#8 0x00007fffe737cc3d in ?? () from /usr/lib/x86_64-linux-gnu/libpython3.9.so.1.0
(gdb)
#9 0x00007fffe728fbda in ?? () from /usr/lib/x86_64-linux-gnu/libpython3.9.so.1.0
(gdb)
#10 0x00007fffe72621ab in PyVectorcall_Call () from /usr/lib/x86_64-linux-gnu/libpython3.9.so.1.0
(gdb)
...
#208 0x00007fffe7260a5d in _PyObject_MakeTpCall () from /usr/lib/x86_64-linux-gnu/libpython3.9.so.1.0
(gdb)
#209 0x00007fffe726267a in PyObject_CallFunction () from /usr/lib/x86_64-linux-gnu/libpython3.9.so.1.0
(gdb)
#210 0x00007fffe72e71fa in PyImport_Import () from /usr/lib/x86_64-linux-gnu/libpython3.9.so.1.0
(gdb)
#211 0x00007fffe737c15d in PyImport_ImportModule () from /usr/lib/x86_64-linux-gnu/libpython3.9.so.1.0
(gdb)
#212 0x000055555556fd9d in pybind11::module_::import (name=0x5555556535d9 "numpy") at /home/user/project/third_party/pybind11/include/pybind11/pybind11.h:1063
1063 PyObject *obj = PyImport_ImportModule(name);
(gdb)
#213 0x000055555560e9e9 in aq::PythonInterpreter::Init (this=0x5555558abde8 <aq::g__PythonInterpreter>) at /home/user/project/src/common/python.cc:24
24 py::module::import("numpy");
This issue seems to be caused by some sort of a heisenbug because I did not have it for a while and reverting my codebase to a previously stable commit did not help. I tried to reinstall Python and the entire environment a couple of times - did not help either.
I think the root cause might be related to my environment and/or Python interpreter distribution. But I don't have a clear idea of where I should start.
Reproducible example code
No response