8000 Unable to build and use libtorch function via pybind11: undefined symbol error upon import · Issue #73016 · pytorch/pytorch · GitHub
[go: up one dir, main page]

Skip to content

Unable to build and use libtorch function via pybind11: undefined symbol error upon import #73016

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
anubane opened this issue Feb 17, 2022 · 15 comments
Labels
module: cpp Related to C++ API module: mkldnn Related to Intel IDEEP or oneDNN (a.k.a. mkldnn) integration triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@anubane
Copy link
anubane commented Feb 17, 2022

🐛 Describe the bug

I am attempting to write a C++ function in libtorch, compile it with pybind11 and then import the .so file in a python file.

The build seems to work fine (no errors are emitted) but, when I try to import it into my python file, execution fails with the error message:
ImportError: /home/cppNpython/MyProj/cpp/build/srcfile.cpython-37m-x86_64-linux-gnu.so: undefined symbol: _ZN3c106detail23torchInternalAssertFailEPKcS2_jS2_RKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE

This is my directory structure:

|-- MyProj
|   |-- cpp
|   |   |-- build
|   |   |   `-- __init__.py
|   |   |-- include
|   |   |   `-- srcfile.h
|   |   |-- lib
|   |   |   |-- libtorch
|   |   |   `-- pybind11
|   |   |-- source
|   |   |   `-- srcfile.cpp
|   |   `-- CMakeLists.txt
|   `-- __init__.py
`-- test.py

Here are the contents of each file:-
srcfile.cpp

#include <srcfile.h>

torch::Tensor test(torch::Tensor t1, torch::Tensor t2) {
  return t1;
}

PYBIND11_MODULE(srcfile, m) {
    m.doc() = "demo function"; // optional module docstring
    m.def("test", &test, "A function giving torch tensor");
}

srcfile.h

#ifndef __SRCFILE_H__
#define __SRCFILE_H__

#include <torch/torch.h>
#include <pybind11/pybind11.h>

torch::Tensor test(torch::Tensor, torch::Tensor);

#endif

CMakeLists.txt

project(srcfile)
cmake_minimum_required(VERSION 3.10)
set(CMAKE_CXX_STANDARD 14)

find_package(PythonLibs REQUIRED)
set(CMAKE_PREFIX_PATH "lib/libtorch/share/cmake/Torch")
find_package(Torch REQUIRED)

include_directories(lib/libtorch)

include_directories(${PYTHON_INCLUDE_DIRS})

add_subdirectory(lib/pybind11)
pybind11_add_module(srcfile src/srcfile.cpp)

target_include_directories(srcfile PRIVATE include/)
target_link_libraries(srcfile PUBLIC ${TORCH_LIBRARIES})

set_property(TARGET srcfile PROPERTY CXX_STANDARD 14)

and at last test.py

import torch
from MyProj.cpp.build import srcfile as j

def main():
    t1 = torch.rand(2,2)
    t2 = torch.rand(2,2)
    print(j.test(t1, t2))


if __name__ == '__main__':
    main()

Inside the build directory I am executing the following command at the terminal:

cmake .. \
-DPYTHON_INCLUDE_DIR=$(python -c "from distutils.sysconfig import get_python_inc; print(get_python_inc())")  \
-DPYTHON_LIBRARY=$(python -c "import distutils.sysconfig as sysconfig; print(sysconfig.get_config_var('LIBDIR'))") \
-DTORCH_LIBRARIES=lib/libtorch && make

which does not emit any error.
I get the error only when I am attempting to import.

I am not sure what I am doing wrong here. Any help/suggestion is welcome.

Versions

PyTorch version: 1.10.2
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A

OS: Ubuntu 18.04.5 LTS (x86_64)
GCC version: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
Clang version: Could not collect
CMake version: version 3.10.2
Libc version: glibc-2.9

Python version: 3.7.0 (default, Oct  9 2018, 10:31:47)  [GCC 7.3.0] (64-bit runtime)
Python platform: Linux-4.15.0-162-generic-x86_64-with-debian-buster-sid
Is CUDA available: False
CUDA runtime version: No CUDA
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] numpy==1.21.5
[pip3] torch==1.10.2
[conda] blas                      1.0                         mkl  
[conda] cpuonly                   2.0                           0    pytorch
[conda] mkl                       2022.0.1           h06a4308_117  
[conda] numpy                     1.21.5                   pypi_0    pypi
[conda] pytorch                   1.10.2              py3.7_cpu_0    pytorch
[conda] pytorch-mutex             1.0                         cpu    pytorch

cc @jbschlosser @gujinghui @PenghuiCheng @XiaobingSuper @jianyuh @VitalyFedyunin

@gchanan gchanan added module: mkldnn Related to Intel IDEEP or oneDNN (a.k.a. mkldnn) integration module: cpp Related to C++ API triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels Feb 17, 2022
@anubane
Copy link
Author
anubane commented Feb 17, 2022

The error message:
undefined symbol: _ZN3c106detail23torchInternalAssertFailEPKcS2_jS2_RKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE

Upon demangling looks as follows:
undefined symbol: c10::detail::torchInternalAssertFail(char const*, char const*, unsigned int, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)

Any thoughts on what might be wrong here?

@anubane
Copy link
Author
anubane commented Feb 18, 2022

I was previously using the cxx11 ABI version of the libtorch package available on pytorch.org.

Just for an experiment, I used the Pre-cxx11 ABI version and now I am getting the following error:

Traceback (most recent call last):
  File "../../../test.py", line 14, in <module>
    main()
  File "../../../test.py", line 10, in main
    print(j.test(t1, t2))
TypeError: test(): incompatible function arguments. The following argument types are supported:
    1. (arg0: at::Tensor, arg1: at::Tensor) -> at::Tensor

Invoked with: tensor([[0.4624, 0.3462],
        [0.4250, 0.3891]]), tensor([[0.6797, 0.2709],
        [0.5302,  0.7420]])

Is this error related to #20356?

@anubane
Copy link
Author
anubane commented Mar 17, 2022

I guess currently cannot do this via CMAKE, but only via setuptools, which PyTorch has already included via cpp_extensions

I am able to build the project now but cannot debug libtorch.

Issue at vscode-cpptools

@ziyuang
Copy link
ziyuang commented Apr 1, 2022

I have a similar issue where _Z16THPVariable_WrapN2at10TensorBaseE (or THPVariable_Wrap(at::TensorBase)) becomes undefined. I have tried both cxx11 ABI and pre-cxx11 ABI but neither works. By "via setuptools" do you mean this part in the official documentation? https://pytorch.org/tutorials/advanced/cpp_extension.html#building-with-setuptools

@roastduck
Copy link

Any update on this thread? Same issue here of missing _ZN3c106detail23torchInternalAssertFailEPKcS2_jS2_RKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE. In my case, the issue only occurs when building with -DCMAKE_BUILD_TYPE=Debug. Maybe we have to build the C++ extension in the same CMAKE_BUILD_TYPE with the one used to build PyTorch itself?

@roastduck
Copy link

Defining a macro named STRIP_ERROR_MESSAGES solves my problem. Maybe PyTorch defines this macro in non-debug mode.

@Jay-Ye
Copy link
Jay-Ye commented Aug 7, 2022

I was previously using the cxx11 ABI version of the libtorch package available on pytorch.org.

Just for an experiment, I used the Pre-cxx11 ABI version and now I am getting the following error:

Traceback (most recent call last):
  File "../../../test.py", line 14, in <module>
    main()
  File "../../../test.py", line 10, in main
    print(j.test(t1, t2))
TypeError: test(): incompatible function arguments. The following argument types are supported:
    1. (arg0: at::Tensor, arg1: at::Tensor) -> at::Tensor

Invoked with: tensor([[0.4624, 0.3462],
        [0.4250, 0.3891]]), tensor([[0.6797, 0.2709],
        [0.5302,  0.7420]])

Is this error related to #20356?

@anubane
I'm getting the exact same error like this, "incompatible function arguments". Did you solve this by compiling the code via setup.py with CppExtension and BuildExtension in torch.utils.cpp_extension?

@mananshah99
Copy link

Running into the same problem as well, even when building torch using the same libtorch.so as my python torch installation. Any chance there is a recommended solution?

@anubane
Copy link
Author
anubane commented Sep 19, 2022

I was previously using the cxx11 ABI version of the libtorch package available on pytorch.org.
Just for an experiment, I used the Pre-cxx11 ABI version and now I am getting the following error:

Traceback (most recent call last):
  File "../../../test.py", line 14, in <module>
    main()
  File "../../../test.py", line 10, in main
    print(j.test(t1, t2))
TypeError: test(): incompatible function arguments. The following argument types are supported:
    1. (arg0: at::Tensor, arg1: at::Tensor) -> at::Tensor

Invoked with: tensor([[0.4624, 0.3462],
        [0.4250, 0.3891]]), tensor([[0.6797, 0.2709],
        [0.5302,  0.7420]])

Is this error related to #20356?

@anubane I'm getting the exact same error like this, "incompatible function arguments". Did you solve this by compiling the code via setup.py with CppExtension and BuildExtension in torch.utils.cpp_extension?

Yes, currently cpp_extension (setuptools) seems to be the only solution.

@grybouilli
Copy link

I was previously using the cxx11 ABI version of the libtorch package available on pytorch.org.
Just for an experiment, I used the Pre-cxx11 ABI version and now I am getting the following error:

Traceback (most recent call last):
  File "../../../test.py", line 14, in <module>
    main()
  File "../../../test.py", line 10, in main
    print(j.test(t1, t2))
TypeError: test(): incompatible function arguments. The following argument types are supported:
    1. (arg0: at::Tensor, arg1: at::Tensor) -> at::Tensor

Invoked with: tensor([[0.4624, 0.3462],
        [0.4250, 0.3891]]), tensor([[0.6797, 0.2709],
        [0.5302,  0.7420]])

Is this error related to #20356?

@anubane I'm getting the exact same error like this, "incompatible function arguments". Did you solve this by compiling the code via setup.py with CppExtension and BuildExtension in torch.utils.cpp_extension?

Yes, currently cpp_extension (setuptools) seems to be the only solution.

Hello, is there still no other solution than this ?

@woywoy123
Copy link

Having the absolute same issue, but only for CUDA extensions.

@grybouilli
Copy link

Worked through my problem if this can help anyone:

Check that anything you're trying to build with libtorch is built with the same c++ standard lib that libtorch was built with.

Let me explain:
I was trying to integrate libtorch to a project that was compiled and built with clang - which uses libc++. Problem is, libtorch - unless built from source by your own hands - is built with gcc which uses libstdc++, so if you ever do what I did, you may as well try libtorch with or without abi, with or without _GLIBCXX_USE_CXX set to 0, you will get undefined symbols because the std::whatever::basic_string coming from your project get mixed up with the std::cxx11::string from libtorch.

So either build libtorch from source with your prefered compiler or build your project with the libstdc++ (gcc with c++11 standard and higher).

@woywoy123
Copy link

https://github.com/r-barnes/pytorch_cmake_example

This looks very interesting.

@alfie-nsugh
Copy link

https://github.com/r-barnes/pytorch_cmake_example

This looks very interesting.

It is, thank you.

@leeeizhang
Copy link

Worked through my problem if this can help anyone:

Check that anything you're trying to build with libtorch is built with the same c++ standard lib that libtorch was built with.

Let me explain: I was trying to integrate libtorch to a project that was compiled and built with clang - which uses libc++. Problem is, libtorch - unless built from source by your own hands - is built with gcc which uses libstdc++, so if you ever do what I did, you may as well try libtorch with or without abi, with or without _GLIBCXX_USE_CXX set to 0, you will get undefined symbols because the std::whatever::basic_string coming from your project get mixed up with the std::cxx11::string from libtorch.

So either build libtorch from source with your prefered compiler or build your project with the libstdc++ (gcc with c++11 standard and higher).

thanks! It was very helpful. In the conda env, I can build cpp extension successfully only when libstdcxx-ng version is same with gcc.

conda create -n dev-test -c conda-forge gcc=11.1 gxx=11.1 libstdcxx-ng=11.1 python=3.11

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module: cpp Related to C++ API module: mkldnn Related to Intel IDEEP or oneDNN (a.k.a. mkldnn) integration triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

No branches or pull requests

10 participants
0