10000 Using OpenBLAS for the released wheels · Issue #10105 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

Using OpenBLAS for the released wheels #10105

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ghost opened this issue Nov 10, 2017 · 22 comments
Closed

Using OpenBLAS for the released wheels #10105

ghost opened this issue Nov 10, 2017 · 22 comments

Comments

@ghost
Copy link
ghost commented Nov 10, 2017

Hey, I just wanted to let you know that OpenBLAS is available for windows and that you should include it in the released wheels. See the numpy appveyor.yml for an example.

@lesteve
Copy link
Member
lesteve commented Nov 10, 2017

AFAIK we rely on the BLAS shipped with numpy so I don't think we have anything to do on the scikit-learn side.

@rth
Copy link
Member
rth commented Nov 10, 2017

If I understand correctly, in the current setup appveyor installs the scipy/numpy wheels from http://www.lfd.uci.edu/~gohlke/pythonlibs/ which are built against MKL,

# Fetch numpy and scipy wheels from the sklearn rackspace wheelhouse.
# Those wheels were collected from http://www.lfd.uci.edu/~gohlke/pythonlibs/
# This is a temporary solution. As soon as numpy and scipy provide official
# wheel for windows we ca delete this --find-links line.
--find-links http://28daf2247a33ed269873-7b1aad3fab3cc330e1fd9d109892382a.r6.cf2.rackcdn.com/
# fix the versions of numpy to force the use of numpy and scipy to use the whl
# of the rackspace folder instead of trying to install from more recent
# source tarball published on PyPI
numpy==1.9.3
scipy==0.16.0

so that would make scikit-learn wheels on windows also dependent on MKL?

The latest versions of numpy / scipy do have windows wheels on PyPi (according to what you said, built with OpenBlas). However, that's not the case for the versions currently pinned in appveyor. I'm not sure how backward compatibility works here: would it make sense incrementing the numpy/scipy versions in appveyor for v0.20 to those that have an official windows wheel, or not?

cc @ogrisel

@lesteve
Copy link
Member
lesteve commented Nov 10, 2017

Note OpenBLAS in Windows wheels is only on numpy master numpy/numpy#9645.

The latest versions of numpy / scipy do have windows wheels on PyPi (according to what you said, built with OpenBlas).

Looking at https://pypi.python.org/pypi/numpy it seems that for the latest release of numpy i.e. 1.13:

Windows wheels are linked against the ATLAS BLAS / LAPACK library, restricted to SSE2 instructions, so may not give optimal linear algebra performance for your machine.

@ghost
Copy link
Author
ghost commented Nov 10, 2017

You seem to be calling system info though. IIUC it won't find the numpy BLAS.

@lesteve
Copy link
Member
lesteve commented Nov 10, 2017

You seem to be calling system info though.

Not sure what you mean by this. I have to admit I am not an expert on neither BLAS nor wheels and even less on Windows. @ogrisel may be the right person to pitch in.

@ghost
Copy link
Author
ghost commented Nov 11, 2017

So this is what I'm talking about:

https://github.com/scikit-learn/scikit-learn/tree/fcb706a7c0074a16bd472ca284145be2e0f7c936/sklearn/src/cblas

Numpy will have a BLAS implementation, but you only can use that implementation through the numpy API. If you call the BLAS API directly, then you won't be using numpy's BLAS, you'll be using your own BLAS.

When I say "you seem to be calling system info," what I mean is that your setup script is looking for a BLAS implementation on the system, and it's not finding one. This is clearly apparent in the Appveyor output.

The solution to this problem is to copy the numpy appveyor script:

https://github.com/numpy/numpy/blob/519991d0f78cc7d37e7dcdd7c0d69a9aa1338ebd/.appveyor.yml#L101-L130:

  # Install "openblas.a" to PYTHON\lib
  # Library provided by Matthew Brett at https://github.com/matthew-brett/build-openblas
  - ps: |
      $PYTHON_ARCH = $env:PYTHON_ARCH
      $PYTHON = $env:PYTHON
      If ($PYTHON_ARCH -eq 32) {
          $OPENBLAS = $env:OPENBLAS_32
      } Else {
          $OPENBLAS = $env:OPENBLAS_64
      }
      $clnt = new-object System.Net.WebClient
      $file = "$(New-TemporaryFile).zip"
      $tmpdir = New-TemporaryFile | %{ rm $_; mkdir $_ }
      $destination = "$PYTHON\lib\openblas.a"

      echo $file
      echo $tmpdir
      echo $OPENBLAS

      $clnt.DownloadFile($OPENBLAS,$file)
      Get-FileHash $file | Format-List

      Expand-Archive $file $tmpdir      

      rm $tmpdir\$PYTHON_ARCH\lib\*.dll.a
      $lib = ls $tmpdir\$PYTHON_ARCH\lib\*.a | ForEach { ls $_ } | Select-Object -first 1
      echo $lib

      cp $lib $destination
      ls $destination

@ghost
Copy link
Author
ghost commented Nov 11, 2017

Also, MKL is available if you want it. They've updated their license to be compatible with OSS projects.

@rth
Copy link
Member
rth commented Nov 15, 2017

@xoviat Thanks for the additional details!

I haven't worked on wheel generation, but according to the mainter docs, scikit-learn wheels for releases are generated in https://github.com/MacPython/scikit-learn-wheels . The appveyor/travis setup there doesn't appear to be strictly identical to those in the main scikit-learn repo.

For Linux, as far as I understand, wheels are built with OpenBLAS with the help of matthew-brett/multibuild (that you are well aware of). Since multibuild also supports appveyor, wouldn't it make sense to do something similar on Windows (particularly to avoid copy-pasting chunks of code between projects)?

Also even if OpenBLAS is used in Windows wheel for the direct BLAS calls, it wouldn't it be preferable to wait for numpy to release official windows wheels with OpenBLAS? Otherwise, it would mean mixing different BLAS libraries for the direct calls and those made via numpy which might be a bit awkward.

Just my 2c. Contributors of the scikit-learn-wheels repo are the people who would be much more knowledgeable about this (@matthew-brett, ogrisel, and @amueller ).

@ghost
Copy link
Author
ghost commented Nov 16, 2017

Also even if OpenBLAS is used in Windows wheel for the direct BLAS calls, it wouldn't it be preferable to wait for numpy to release official windows wheels with OpenBLAS? Otherwise, it would mean mixing different BLAS libraries for the direct calls and those made via numpy which might be a bit awkward.

This has no practical impact.

Since multibuild also supports appveyor, wouldn't it make sense to do something similar on Windows (particularly to avoid 8000 copy-pasting chunks of code between projects)?

Would be good to reuse code.

@rth
Copy link
Member
rth commented Nov 29, 2017

@xoviat We have discussed this in more detail, it is certainly worth doing, but it is uncertain who would have enough bandwidth to make it happen. Thanks for opening this issue.

Long-term approach for this could be to migrate the explict BLAS calls to scipy Cython BLAS API which would allow removing the bundled BLAS, or the necessity to explicitly link against BLAS. This would require scipy 0.16.0 though, while currently, the minimal supported version is 0.13.3.

@ghost
Copy link
Author
ghost commented Nov 29, 2017

I will have an openblas available as soon as some PRs are merged.

@rth
Copy link
Member
rth commented Nov 29, 2017

I will have an openblas available as soon as some PRs are merged.

Great, thanks for looking into it! https://github.com/MacPython/scikit-learn-wheels is the place to make PRs scikit-learn wise, I think. I'm currently working on #9485; nightly wheels might be a good place to start testing this.

@ghost
Copy link
Author
ghost commented Nov 30, 2017

You can prepend this to the install script:

      CONDA_INSTALL_LOCN: C:\\Miniconda36-x64

  # Add path, activate `conda` and update conda.
  - cmd: call %CONDA_INSTALL_LOCN%\Scripts\activate.bat
  - cmd: conda update --yes --quiet conda
  
  # Add our channels.
  - cmd: conda config --add channels defaults
  - cmd: conda config --add channels conda-forge
  - cmd: conda install --yes openblas

  # Fix the OpenBLAS library name
  - ps: cp $env:CONDA_INSTALL_LOCN\Library\lib\libopenblas.lib $env:CONDA_INSTALL_LOCN\Library\lib\openblas.lib

You will also need numpy/numpy#9942 to allow numpy to detect the library location automatically.

@jakirkham
Copy link
Contributor

Due to some recent debugging in a different issue, I've learned that the scikit-learn wheels do not appears to be linking to the same BLAS used by numpy, but are instead linking against scikit-learn's internal BLAS.

Determined this by extracting scikit_learn-0.19.2-cp36-cp36m-manylinux1_x86_64.whl and running strings sklearn/utils/arrayfuncs.cpython-36m-x86_64-linux-gnu.so | grep 'ATL_\|cblas_' in the extracted zip (listed below). The ATL_* symbols in particular show these come from ATLAS. Whereas the NumPy wheel is linked against OpenBLAS.

cblas_dcopy
cblas_drot
cblas_drotg
cblas_scopy
cblas_srot
cblas_srotg
ATL_drefcopy
ATL_srefrotg
ATL_srefrot
ATL_drefrot
ATL_srefcopy
ATL_drefrotg

We know NumPy wheel is linked against OpenBLAS from it's configuration information (below). Though if one wants to be very sure, this can also be verified by running ldd numpy/core/multiarray.cpython-36m-x86_64-linux-gnu.so and finding the OpenBLAS linkage in the list. This all taken from the wheel numpy-1.14.5-cp36-cp36m-manylinux1_x86_64.whl.

$ python -c "import numpy; numpy.__config__.show()"
blas_mkl_info:
  NOT AVAILABLE
blis_info:
  NOT AVAILABLE
openblas_info:
    libraries = ['openblas', 'openblas']
    library_dirs = ['/usr/local/lib']
    language = c
    define_macros = [('HAVE_CBLAS', None)]
blas_opt_info:
    libraries = ['openblas', 'openblas']
    library_dirs = ['/usr/local/lib']
    language = c
    define_macros = [('HAVE_CBLAS', None)]
lapack_mkl_info:
  NOT AVAILABLE
openblas_lapack_info:
    libraries = ['openblas', 'openblas']
    library_dirs = ['/usr/local/lib']
    language = c
    define_macros = [('HAVE_CBLAS', None)]
lapack_opt_info:
    libraries = ['openblas', 'openblas']
    library_dirs = ['/usr/local/lib']
    language = c
    define_macros = [('HAVE_CBLAS', None)]

So why does this happen? My guess (though I haven't tested it) is the configuration information above. In particular, the library_dirs reference where openblas was installed when the NumPy wheel was built. However after auditwheel is run the shared objects are moved into the wheel. So the paths are no longer correct. As a result, when scikit-learn builds and looks for the BLAS using NumPy's information, it doesn't find it and falls back to the internal BLAS.

@jakirkham
Copy link
Contributor

Of course, there is always the chance I've messed something up above. If so, please let me know.

@rth
Copy link
Member
rth commented Jul 20, 2018

Thanks for investigating @jakirkham . I'm not an expert on OpenBLAS packaging either, but your comments seem very plausible. From what I saw, both numpy and scipy wheels bundle openblas under .libs/ folder, then dynamically link to it. Scikit-learn manylinux wheels don't, and as you demonstrated also include ATLAS symbols from the vendored BLAS. I can confirm that conda install (from default channel) doesn't include those.

I have opened an issue about it in MacPython/scikit-learn-wheels#6

@jeremiedbb would you be able to ask @ogrisel 's input about this ? :)

While this is not release blocking, if true it means that pip installed scikit-learn on Linux might be sensibly slower that it needs to be. Tagging for v0.20 to raise awareness about this issue.

@rth rth added this to the 0.20 milestone Jul 20, 2018
@jakirkham
Copy link
Contributor

FWIW similar inspection of the macOS wheels shows they are linked to Accelerate. Expect the issue shares the same cause as the Linux case.

While not slow necessarily, Accelerate does have known issues with forking, which makes it problematic when using multiprocessing. Though IIUC fork+exec (e.g. loky) or spawn shouldn't have this problem.

@ogrisel
Copy link
Member
ogrisel commented Jul 20, 2018

No, we don't include openblas in the released wheels. When no BLAS is available at build time, our setup.py are configured to fallback on unoptimized source-based ATLAS code. This is what is used in our linux and windows wheels at the moment.

Ideally, our cython code that need BLAS could cimport the BLAS functions exported by scipy using:
htt 8000 ps://docs.scipy.org/doc/scipy-0.18.1/reference/linalg.cython_blas.html

However, this is not yet possible as we currently support old versions of scipy that do not provide this feature. This should be possible to rewrite our cython code in scikit-learn 0.21 to use this if we bump up the dependency.

In the meantime, we can ship openblas in the wheels instead of using the unoptimized source-based ATLAS fallback. However, that requires some work on the wheel building infrastructure.

@ogrisel
Copy link
Member
ogrisel commented Jul 20, 2018

Note that the lack of BLAS in our wheels is only impacting functions that call into BLAS directly in Cython code. The parts of scikit-learn that uses numpy or scipy will use the BLAS implementations from those libs.

@ogrisel
Copy link
Member
ogrisel commented Jul 20, 2018

While it would be nice to update our wheel system to build against openblas, I am not sure it's worth blocking the 0.20 release for that.

@rth rth removed this from the 0.20 milestone Jul 20, 2018
@rth
Copy link
Member
rth commented Jul 20, 2018

No, we don't include openblas in the released wheels.

Thanks for the explanations @ogrisel ! I got confused by the discussion on the numpy mailing list about this (in retrospective I'm not sure why) and though that only applied for Windows. Untaged from 0.20. You are right, I imagine one could always make another minor release 0.20.x if someone does the work on including OpenBLAS in scikit-learn's manylinux wheels.

Ideally, our cython code that need BLAS could cimport the BLAS functions exported by scipy using:

Opened an issue about this in #11638

@rth
Copy link
Member
rth commented Sep 27, 2018

Closing in favor of using the Cython BLAS API (#11638) which should become possible in 0.21 after bumping the minimal scipy requirement (#12184).

@rth rth closed this as completed Sep 27, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants
0