8000 F2PY issues with wrapping CUDA fortran codes with PGI compiler · Issue #17575 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

F2PY issues with wrapping CUDA fortran codes with PGI compiler #17575

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
nickjbrowning opened this issue Oct 16, 2020 · 2 comments
Closed

F2PY issues with wrapping CUDA fortran codes with PGI compiler #17575

nickjbrowning opened this issue Oct 16, 2020 · 2 comments

Comments

@nickjbrowning
Copy link
nickjbrowning commented Oct 16, 2020

Hey guys,

I'm having some trouble using f2py with the pgi fortran compilers.

cpu_wrapper.f90

  subroutine wrap_stuff(Z, ni)
      use cudafor

      implicit none

      real, dimension(:), intent(out) :: Z
      integer, intent(in) :: ni

      type(dim3) :: grid, tBlock

      real, device, dimension(:), allocatable :: Z_d

      allocate(Z_d(ni))

      tBlock = dim3(32,1,1)
      grid = dim3(1,1,1)

      Z_d = Z

      call  do_stuff<<<grid, tBlock>>>(Z_d, ni)

      Z = Z_d

    end subroutine

gpu_code.f90

attributes(global) subroutine do_stuff(Z, N)

      use cudafor
      
      implicit none

      real, dimension(:), device :: Z
      integer, intent(in) :: N

      integer :: i

      do i = threadIdx%x, N, blockDim%x
        Z(i) = Z(i) ** 2.0
      enddo
      

    end subroutine do_stuff

command line: f2py --fcompiler=pg -m test -c cpu_wrapper.f90 gpu_code.f90 --f90flags="-Mcuda -fPIC"

everything compiles fine, however running ldd -r on test.cpython-37m-x86_64-linux-gnu.so yields the following:

ldd -r test.cpython-37m-x86_64-linux-gnu.so 
	linux-vdso.so.1 (0x00007ffdd1390000)
	libpgf90rtl.so => /opt/pgi/linux86-64/2019/lib/libpgf90rtl.so (0x00007fd1514bc000)
	libpgf90.so => /opt/pgi/linux86-64/2019/lib/libpgf90.so (0x00007fd150f24000)
	libpgf90_rpm1.so => /opt/pgi/linux86-64/2019/lib/libpgf90_rpm1.so (0x00007fd150d22000)
	libpgf902.so => /opt/pgi/linux86-64/2019/lib/libpgf902.so (0x00007fd150b0f000)
	libpgftnrtl.so => /opt/pgi/linux86-64/2019/lib/libpgftnrtl.so (0x00007fd1508d0000)
	libpgatm.so => /opt/pgi/linux86-64/2019/lib/libpgatm.so (0x00007fd1506c7000)
	libpgkomp.so => /opt/pgi/linux86-64/2019/lib/libpgkomp.so (0x00007fd1504c4000)
	libomp.so => /home/nick/anaconda3/lib/libomp.so (0x00007fd151a36000)
	libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fd1502a5000)
	libpgmath.so => /opt/pgi/linux86-64/2019/lib/libpgmath.so (0x00007fd14fe90000)
	libpgc.so => /opt/pgi/linux86-64/2019/lib/libpgc.so (0x00007fd14fb37000)
	librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007fd14f92f000)
	libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fd14f591000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fd14f1a0000)
	libgcc_s.so.1 => /home/nick/anaconda3/lib/libgcc_s.so.1 (0x00007fd1519f7000)
	/lib64/ld-linux-x86-64.so.2 (0x00007fd1518e6000)
	libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fd14ef9c000)
undefined symbol: PyExc_ValueError	(./test.cpython-37m-x86_64-linux-gnu.so)
undefined symbol: PyCapsule_Type	(./test.cpython-37m-x86_64-linux-gnu.so)
undefined symbol: _Py_NoneStruct	(./test.cpython-37m-x86_64-linux-gnu.so)
undefined symbol: PyExc_AttributeError	(./test.cpython-37m-x86_64-linux-gnu.so)
undefined symbol: PyType_Type	(./test.cpython-37m-x86_64-linux-gnu.so)
undefined symbol: PyExc_RuntimeError	(./test.cpython-37m-x86_64-linux-gnu.so)
undefined symbol: PyExc_TypeError	(./test.cpython-37m-x86_64-linux-gnu.so)
undefined symbol: PyExc_ImportError	(./test.cpython-37m-x86_64-linux-gnu.so)
undefined symbol: PyComplex_Type	(./test.cpython-37m-x86_64-linux-gnu.so)
undefined symbol: PyLong_AsLong	(./test.cpython-37m-x86_64-linux-gnu.so)
undefined symbol: PyDict_GetItemString	(./test.cpython-37m-x86_64-linux-gnu.so)
undefined symbol: PyObject_GetAttrString	(./test.cpython-37m-x86_64-linux-gnu.so)
undefined symbol: PyMem_Free	(./test.cpython-37m-x86_64-linux-gnu.so)
undefined symbol: PyType_IsSubtype	(./test.cpython-37m-x86_64-linux-gnu.so)
undefined symbol: PyModule_GetDict	(./test.cpython-37m-x86_64-linux-gnu.so)
undefined symbol: PyErr_NoMemory	(./test.cpython-37m-x86_64-linux-gnu.so)
undefined symbol: PyDict_SetItemString	(./test.cpython-37m-x86_64-linux-gnu.so)
undefined symbol: PyUnicode_FromFormat	(./test.cpython-37m-x86_64-linux-gnu.so)
undefined symbol: __cudaRegisterFunction	(./test.cpython-37m-x86_64-linux-gnu.so)
undefined symbol: PyArg_ParseTupleAndKeywords	(./test.cpython-37m-x86_64-linux-gnu.so)
undefined symbol: _PyObject_New	(./test.cpython-37m-x86_64-linux-gnu.so)
undefined symbol: PyNumber_Long	(./test.cpython-37m-x86_64-linux-gnu.so)
undefined symbol: PyBytes_FromString	(./test.cpython-37m-x86_64-linux-gnu.so)
undefined symbol: PyErr_Format	(./test.cpython-37m-x86_64-linux-gnu.so)
undefined symbol: PyMem_Malloc	(./test.cpython-37m-x86_64-linux-gnu.so)
undefined symbol: Py_BuildValue	(./test.cpython-37m-x86_64-linux-gnu.so)
undefined symbol: PyImport_ImportModule	(./test.cpython-37m-x86_64-linux-gnu.so)
undefined symbol: __pgiLaunchKernelFromStub	(./test.cpython-37m-x86_64-linux-gnu.so)
undefined symbol: PyUnicode_FromString	(./test.cpython-37m-x86_64-linux-gnu.so)
undefined symbol: PySequence_Check	(./test.cpython-37m-x86_64-linux-gnu.so)
undefined symbol: pgf90_dev_allocated_i8	(./test.cpython-37m-x86_64-linux-gnu.so)
undefined symbol: PyErr_Clear	(./test.cpython-37m-x86_64-linux-gnu.so)
undefined symbol: pgf90_dev_copyin	(./test.cpython-37m-x86_64-linux-gnu.so)
undefined symbol: __pgiLaunchKernel	(./test.cpython-37m-x86_64-linux-gnu.so)
undefined symbol: PyOS_snprintf	(./test.cpython-37m-x86_64-linux-gnu.so)
undefined symbol: PyDict_New	(./test.cpython-37m-x86_64-linux-gnu.so)
undefined symbol: PyErr_SetString	(./test.cpython-37m-x86_64-linux-gnu.so)
undefined symbol: PyCapsule_New	(./test.cpython-37m-x86_64-linux-gnu.so)
undefined symbol: pgf90_dev_dealloc03_i8	(./test.cpython-37m-x86_64-linux-gnu.so)
undefined symbol: __pgi_cuda_register_fat_binaryA	(./test.cpython-37m-x86_64-linux-gnu.so)
undefined symbol: PyObject_SetAttrString	(./test.cpython-37m-x86_64-linux-gnu.so)
undefined symbol: PyCapsule_GetPointer	(./test.cpython-37m-x86_64-linux-gnu.so)
undefined symbol: pgf90_dev_copyout	(./test.cpython-37m-x86_64-linux-gnu.so)
undefined symbol: PyObject_Free	(./test.cpython-37m-x86_64-linux-gnu.so)
undefined symbol: PySequence_GetItem	(./test.cpython-37m-x86_64-linux-gnu.so)
undefined symbol: PyErr_NewException	(./test.cpython-37m-x86_64-linux-gnu.so)
undefined symbol: PyModule_Create2	(./test.cpython-37m-x86_64-linux-gnu.so)
undefined symbol: pgf90_dev_alloc04_i8	(./test.cpython-37m-x86_64-linux-gnu.so)
undefined symbol: PyErr_Occurred	(./test.cpython-37m-x86_64-linux-gnu.so)
undefined symbol: PyObject_GenericGetAttr	(./test.cpython-37m-x86_64-linux-gnu.so)
undefined symbol: pgf90_dev_dealloc_mbr03_i8	(./test.cpython-37m-x86_64-linux-gnu.so)
undefined symbol: PyDict_DelItemString	(./test.cpython-37m-x86_64-linux-gnu.so)
undefined symbol: PyErr_Print	(./test.cpython-37m-x86_64-linux-gnu.so)
undefined symbol: PyUnicode_Concat	(./test.cpython-37m-x86_64-linux-gnu.so)
undefined symbol: PyUnicode_FromStringAndSize	(./test.cpython-37m-x86_64-linux-gnu.so)

clearly, the python3 libraries are missing, so adding this with the following command line:

f2py --fcompiler=pg -m test -c cpu_wrapper.f90 gpu_code.f90 --f90flags="-Mcuda -fPIC" /home/nick/anaconda3/lib/libpython3.so

yields:

ldd -r test.cpython-37m-x86_64-linux-gnu.so 
	linux-vdso.so.1 (0x00007fff21106000)
	libpython3.so => /home/nick/anaconda3/lib/libpython3.so (0x00007f734fa2c000)
	libpgf90rtl.so => /opt/pgi/linux86-64/2019/lib/libpgf90rtl.so (0x00007f734f3e2000)
	libpgf90.so => /opt/pgi/linux86-64/2019/lib/libpgf90.so (0x00007f734ee4a000)
	libpgf90_rpm1.so => /opt/pgi/linux86-64/2019/lib/libpgf90_rpm1.so (0x00007f734ec48000)
	libpgf902.so => /opt/pgi/linux86-64/2019/lib/libpgf902.so (0x00007f734ea35000)
	libpgftnrtl.so => /opt/pgi/linux86-64/2019/lib/libpgftnrtl.so (0x00007f734e7f6000)
	libpgatm.so => /opt/pgi/linux86-64/2019/lib/libpgatm.so (0x00007f734e5ed000)
	libpgkomp.so => /opt/pgi/linux86-64/2019/lib/libpgkomp.so (0x00007f734e3ea000)
	libomp.so => /home/nick/anaconda3/lib/libomp.so (0x00007f734f957000)
	libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f734e1cb000)
	libpgmath.so => /opt/pgi/linux86-64/2019/lib/libpgmath.so (0x00007f734ddb6000)
	libpgc.so => /opt/pgi/linux86-64/2019/lib/libpgc.so (0x00007f734da5d000)
	librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f734d855000)
	libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f734d4b7000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f734d0c6000)
	libgcc_s.so.1 => /home/nick/anaconda3/lib/libgcc_s.so.1 (0x00007f734f916000)
	libpython3.7m.so.1.0 => /home/nick/anaconda3/lib/./libpython3.7m.so.1.0 (0x00007f734cd5d000)
	/lib64/ld-linux-x86-64.so.2 (0x00007f734f80c000)
	libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f734cb59000)
	libutil.so.1 => /lib/x86_64-linux-gnu/libutil.so.1 (0x00007f734c956000)
undefined symbol: __cudaRegisterFunction	(./test.cpython-37m-x86_64-linux-gnu.so)
undefined symbol: __pgiLaunchKernelFromStub	(./test.cpython-37m-x86_64-linux-gnu.so)
undefined symbol: pgf90_dev_allocated_i8	(./test.cpython-37m-x86_64-linux-gnu.so)
undefined symbol: pgf90_dev_copyin	(./test.cpython-37m-x86_64-linux-gnu.so)
undefined symbol: __pgiLaunchKernel	(./test.cpython-37m-x86_64-linux-gnu.so)
undefined symbol: pgf90_dev_dealloc03_i8	(./test.cpython-37m-x86_64-linux-gnu.so)
undefined symbol: __pgi_cuda_register_fat_binaryA	(./test.cpython-37m-x86_64-linux-gnu.so)
undefined symbol: pgf90_dev_copyout	(./test.cpython-37m-x86_64-linux-gnu.so)
undefined symbol: pgf90_dev_alloc04_i8	(./test.cpython-37m-x86_64-linux-gnu.so)
undefined symbol: pgf90_dev_dealloc_mbr03_i8	(./test.cpython-37m-x86_64-linux-gnu.so)

I can't keep adding individual libraries, so I was wondering if anyone has any ideas?

This code works fine when used from a program file and an executable is created the "usual" way.

@nickjbrowning
Copy link
Author
nickjbrowning commented Oct 18, 2020

Solution has been found over at PGI forums: https://forums.developer.nvidia.com/t/compiling-python-wrappers-with-f2py-and-cuda-fortran/157217/5

essentially, the -Mcuda linker flag isn't used for the linking step by f2py.

The solution is instead to set NPY_DISTUTILS_APPEND_FLAGS=1 and LDFLAGS="-MCUDA" before execution:

NPY_DISTUTILS_APPEND_FLAGS=1 LDFLAGS="-Mcuda" f2py --fcompiler=pg -m test -c cpu_wrapper.f90 gpu_code.f90 --f90flags="-Mcuda -fPIC" /home/nick/anaconda3/lib/libpython3.so

and with this the .so is created, ldd -r displays no missing symbols and the test.so shared library imports successfully into python3.

It's still a bit confusing though, why I need to link against libpython3.so explicitly? Shouldn't f2py figure this out?

@HaoZeke
Copy link
Member
HaoZeke commented Sep 5, 2023

Should be manually modified from meson.build files after #24532.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

No branches or pull requests

4 participants
0