8000 BUG: f2py undefined symbol for MPI application in HPE-Cray cluster with gfortran compiler (or "ftn" wrapper) · Issue #25159 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

BUG: f2py undefined symbol for MPI application in HPE-Cray cluster with gfortran compiler (or "ftn" wrapper) #25159

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
AlexisEspinosaGayosso opened this issue Nov 16, 2023 · 4 comments

Comments

@AlexisEspinosaGayosso
Copy link
AlexisEspinosaGayosso commented Nov 16, 2023

Describe the issue:

I have a fortran subroutine that makes use of MPI:

subroutine sayhello
use mpi
implicit none


    integer :: comm, rank, size, ierr, namelength
    character(len=15) :: processorname

call MPI_INIT(ierr)
call MPI_Comm_size(MPI_COMM_WORLD, size, ierr)
call MPI_Comm_rank(MPI_COMM_WORLD, rank, ierr)
call MPI_GET_PROCESSOR_NAME(processorName, namelength, ierr)
print *, 'Hello, World! I am process ',rank,' of ',size,'.'

end subroutine sayhello

Then, I create the library with f2py:

$ module load PrgEnv-gnu/8.3.3 python/3.10.10 py-numpy/1.23.4 py-mpi4py/3.1.4-py3.10.10
$ LDSHARED=ftn F77=ftn F90=ftn f2py --f90exec=ftn --verbose -c helloworld.f90 -m helloworld

This creates the library without errors:
helloworld.cpython-310-x86_64-linux-gnu.so

But when loading the library, I get an error related to MPI libraries:

$ python -c "import helloworld"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
ImportError: /software/projects/bq2/espinosa/f2py/f2py_test/helloworld.cpython-310-x86_64-linux-gnu.so: undefined symbol: mpi_comm_rank_

When reviewing the output of the compilation, I can see that the "ftn" wrapper (which provides all the "options machinery" for compiling MPI code in the HPE-Cray) is used for compilation, but not for linking (partial output below):

INFO: customize Gnu95FCompiler
DEBUG: find_executable('gfortran')
INFO: Found executable /opt/cray/pe/gcc/12.2.0/bin/gfortran
INFO: customize Gnu95FCompiler using build_ext
********************************************************************************
<class 'numpy.distutils.fcompiler.gnu.Gnu95FCompiler'>
version_cmd     = ['/opt/cray/pe/gcc/12.2.0/bin/gfortran', '-dumpversion']
compiler_f77    = ['ftn', '-Wall', '-g', '-ffixed-form', '-fno-second-underscore', '-fPIC', '-O3', '-funroll-loops']
compiler_f90    = ['ftn', '-Wall', '-g', '-fno-second-underscore', '-fPIC', '-O3', '-funroll-loops']
compiler_fix    = ['ftn', '-Wall', '-g', '-ffixed-form', '-fno-second-underscore', '-Wall', '-g', '-fno-second-underscore', '-fPIC', '-O3', '-funroll-loops']
linker_so       = ['/opt/cray/pe/gcc/12.2.0/bin/gfortran', '-Wall', '-g', '-Wall', '-g', '-shared']
archiver        = ['/opt/cray/pe/gcc/12.2.0/bin/gfortran', '-cr']
ranlib          = ['/opt/cray/pe/gcc/12.2.0/bin/gfortran']
linker_exe      = ['/opt/cray/pe/gcc/12.2.0/bin/gfortran', '-Wall', '-Wall']
version         = LooseVersion ('12.2.0')
libraries       = ['gfortran']
library_dirs    = ['/opt/cray/pe/gcc/12.2.0/snos/lib/gcc/x86_64-suse-linux/12.2.0/../../../../lib64', '/opt/cray/pe/gcc/12.2.0/snos/lib/gcc/x86_64-suse-linux/12.2.0/../../../../lib64', '/software/setonix/2023.08/software/linux-sles15-zen3/gcc-12.2.0/python-3.10.10-bk4mjnuv6ufkvy3gb5h62l65dgv6zost/lib']
object_switch   = '-o '
compile_switch  = '-c'
include_dirs    = ['/software/setonix/2023.08/software/linux-sles15-zen3/gcc-12.2.0/python-3.10.10-bk4mjnuv6ufkvy3gb5h62l65dgv6zost/include/python3.10']

The linker_so is calling gfortran directly, so it is not using all the libraries and paths that the "ftn" wrapper calls.

Issue is very similar (if not exactly the same) as that in:
#16481 (f2py undefined symbol with PGI fortran compiler and MPI calls)
but now when using the gfortran compiler (or "ftn" wrapper) in HPE-Cray EX cluster. On that similar issue, the explicit use of <F90> in the settings for "linker_so" made the trick. So I went into gnu.py file and noticed that the mentioned fix does not apply here, as <F90> setting is already there:

possible_executables = ['gfortran', 'f95']
    executables = {
        'version_cmd'  : ["<F90>", "-dumpversion"],
        'compiler_f77' : [None, "-Wall", "-g", "-ffixed-form",
                          "-fno-second-underscore"],
        'compiler_f90' : [None, "-Wall", "-g",
                          "-fno-second-underscore"],
        'compiler_fix' : [None, "-Wall",  "-g","-ffixed-form",
                          "-fno-second-underscore"],
        'linker_so'    : ["<F90>", "-Wall", "-g",],
        'archiver'     : ["ar", "-cr"],
        'ranlib'       : ["ranlib"],
        'linker_exe'   : [None, "-Wall"]
    }

Then, I modified the gnu.py file and add all the paths and libraries that the ftn wrapper would have called in the linking step:

'linker_so'    : ["<F90>", "-Wall", "-g",
                          "-I/opt/cray/pe/mpich/8.1.25/ofi/gnu/9.1/include",
                          "-I/opt/cray/pe/libsci/23.02.1.1/GNU/9.1/x86_64/include",
                          "-I/opt/cray/pe/dsmml/0.2.2/dsmml//include",
                          "-I/opt/cray/xpmem/2.5.2-2.4_3.47__gd0f7936.shasta/include",
                          "-L/opt/cray/pe/mpich/8.1.25/ofi/gnu/9.1/lib",
                          "-L/opt/cray/pe/libsci/23.02.1.1/GNU/9.1/x86_64/lib",
                          "-L/opt/cray/pe/dsmml/0.2.2/dsmml//lib",
                          "-L/opt/cray/xpmem/2.5.2-2.4_3.47__gd0f7936.shasta/lib64",
                          "-ldl",
                          "-Wl,--as-needed,-lsci_gnu_82_mpi,--no-as-needed",
                          "-Wl,--as-needed,-lsci_gnu_82,--no-as-needed",
                          "-Wl,--as-needed,-lmpifort_gnu_91,--no-as-needed",
                          "-Wl,--as-needed,-lmpi_gnu_91,--no-as-needed",
                          "-Wl,--as-needed,-ldsmml,--no-as-needed",
                          "-lxpmem"],

With that change, the creation of the library again finished without errors, but when calling the library in python the same error happens:

$ python -c "import helloworld"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
ImportError: /software/projects/bq2/espinosa/f2py/f2py_test/helloworld.cpython-310-x86_64-linux-gnu.so: undefined symbol: mpi_comm_rank_

So, it seems that even if the ftn wrapper was used in the linking step, the same error may persists. What would be the error source then?

(Using a ctypes approach works for the same kind of MPI test function, so I'm wondering what could be going wrong with the f2py approach?)

Reproduce the code example:

All explained in the Description section above (including code snippets)

Error message:

$ python -c "import helloworld"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
ImportError: /software/projects/bq2/espinosa/f2py/f2py_test/helloworld.cpython-310-x86_64-linux-gnu.so: undefined symbol: mpi_comm_rank_

Runtime information:

$ python -c "import sys, numpy; print(numpy.__version__); print(sys.version)"
1.23.4
3.10.10 (main, Aug 25 2023, 00:41:58) [GCC 12.2.0 20220819 (HPE)]

Context for the issue:

Users already count with postprocessing sets of scripts and functions that make use of f2py with fortran-MPI code. This error is affecting their workflows! Currently they are moving their data to an external cluster to proceed with their postprocessing which is, definitively, not ideal!

@rgommers
Copy link
Member

Thanks for the report @AlexisEspinosaGayosso

When reviewing the output of the compilation, I can see that the "ftn" wrapper (which provides all the "options machinery" for compiling MPI code in the HPE-Cray) is used for compilation, but not for linking (partial output below):
[...]
So, it seems that even if the ftn wrapper was used in the linking step, the same error may persists. What would be the error source then?

That is hard to say without being able to reproduce the issue. My first recommendation is to see if things work for you with numpy 1.26.2 or the main branch, because we switched from using distutils to using meson for numpy itself and in f2py.

@lcrippa
Copy link
lcrippa commented Aug 29, 2024

Hi,
i seem to run in a similar problem with a Debian setup with openmpi 4.1.6. The fortran compiler is mpif90, and the version of f2py is 1.26.4, python is 3.12.

if I create a similar fortran file, and run f2py with
f2py --verbose -c helloworld.f90 -m helloworld
the .so file is created, but cannot be imported due to missing symbols.

Where should I look for the relevant logfiles? The on-screen output of f2py with meson backend is rather concise

Thanks!

@mattip
Copy link
Member
mattip commented Aug 30, 2024

@HaoZeke is there an easy way to add the meson builddir option for build artifacts to f2py?

@HaoZeke
Copy link
Member
HaoZeke commented Aug 30, 2024

@HaoZeke is there an easy way to add the meson builddir option for build artifacts to f2py?

f2py --build-dir blah should create the project with the meson builddir inside it, so the meson.build can be edited...

I'll look into this issue soon, seems like --dep should handle things more gracefully.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants
0