8000 Numpy inverse of very large matrix returns all-zero matrix without error · Issue #5898 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

Numpy inverse of very large matrix returns all-zero matrix without error #5898

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
sylvainr opened this issue May 20, 2015 · 31 comments
Closed

Comments

@sylvainr
Copy link

I have been experimenting with large matrix inversions (have to inverse the whole matrix is my specific case) to check the runtime. It all works well until I tried to inverse a 50000 by 50000 matrix:

In [10]: A = np.eye(50000)

In [11]: A
Out[11]: 
array([[ 1.,  0.,  0., ...,  0.,  0.,  0.],
       [ 0.,  1.,  0., ...,  0.,  0.,  0.],
       [ 0.,  0.,  1., ...,  0.,  0.,  0.],
       ..., 
       [ 0.,  0.,  0., ...,  1.,  0.,  0.],
       [ 0.,  0.,  0., ...,  0.,  1.,  0.],
       [ 0.,  0.,  0., ...,  0.,  0.,  1.]])

In [12]: %time X = np.linalg.inv(A)
CPU times: user 2.95 s, sys: 3.24 s, total: 6.19 s
Wall time: 6.19 s

In [13]: X.shape
Out[13]: (50000, 50000)

In [14]: np.abs(X).sum()
Out[14]: 0.0

The run time of ~6 seconds is very small. But more importantly: I should not have the inv() function to return an all-zero matrix without any error!

Am I missing something here? Is it a result I should be expecting?


About my environment:

Right before running the snippet above, I made sure I had enough RAM:

$ cat /proc/meminfo | grep MemFree
MemFree:        138568436 kB

Running Conda 3.12.0 with:
numpy 1.9.2 py27_p0 [mkl]

OS: Linux seftrtools1 3.16.0-34-generic #47~14.04.1-Ubuntu SMP Fri Apr 10 17:49:16 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

$ python --version
Python 2.7.6
$ ulimit
unlimited
@sylvainr
Copy link
Author

Just tested with the MKL-free numpy and have the same exact problem and behavior.

$ conda list | grep numpy
numpy                     1.9.2                    py27_0  

I've also tested with conda's numpy 1.8.2: same problem.

Then I tried with ubuntu Ubuntu 14.04.2 LTS's python+numpy (apt-get that installed python-numpy 1:1.8.2-0ubuntu0.1): I have the same exact problem every time.

@argriffing
Copy link
Contributor

Is N=50000 the smallest value of N where you see this behavior?

@sylvainr
Copy link
Author

It appears that with N=46000 it works, but it starts failing with N=47000.

A gentleman on IRC suggested to use scipy's linalg.inv function and see if that worked. It appears to be working (at least it does not return a full zero matrix within a few seconds). Still need to verify the output makes sense.

@njsmith
Copy link
Member
njsmith commented May 20, 2015

Does it work with N=46335 but fail with N=46345? Sounds like a bug in
handling matrices that are too big for BLAS's 32-bit indices...

On Wed, May 20, 2015 at 3:05 PM, Sylvain R notifications@github.com wrote:

It appears that with N=46000 it works, but it starts failing with N=47000.

A gentleman on IRC suggested to use scipy's linalg.inv function and see if
that worked. It appears to be working (at least it does not return a full
zero matrix within a few seconds). Still need to verify the output makes
sense.


Reply to this email directly or view it on GitHub
#5898 (comment).

Nathaniel J. Smith -- http://vorpus.org

@argriffing
Copy link
Contributor

It appears that with N=46000 it works, but it starts failing with N=47000.

If I were a gambler I might bet it has something to do with this:

>>> 46340**2 < 1<<31 < 46341**2
True

@argriffing
Copy link
Contributor< 8000 /span>

^what @njsmith said

@sylvainr
Copy link
Author

@argriffing: you are spot on, it fails with N=46341, but works with N=46340. Just as you predicted.

Definitely looks like a 32 bit variable bug!

For now I'll use the scipy.linalg.inv() method.

@argriffing
Copy link
Contributor

@sylvainr I've opened the PR #5899 but I don't have enough RAM to check it.

@argriffing
Copy link
Contributor

I'm also curious if np.linalg.inv may have worked with larger inputs in version 1.7 and earlier (before the generalized ufuncs PR #3220).

@sylvainr
8000 Copy link
Author

@argriffing: I reverted my numpy to numpy-1.6.2 (thanks conda) and ran my test script. Looks like it works (i.e. it does not return a 0 filled matrix within a few seconds.... it has been running on all my cores for a few minutes now, which typically indicates success).

Will try to run your PR (if I figure how to do it).

@njsmith
Copy link
Member
njsmith commented May 21, 2015

If you have a working dev environment, then something like this:
git clone https://github.com/argriffing/numpy.git
cd numpy
git checkout improve-umath-linalg
./runtests.py --ipython
should give you an ipython shell whose numpy will contain @agriffing's
proposed changes. (Check numpy.version to be sure.)

On Wed, May 20, 2015 at 9:24 PM, Sylvain R notifications@github.com wrote:

@argriffing https://github.com/argriffing: I reverted my numpy to
numpy-1.6.2 (thanks conda) and ran my test script. Looks like it works
(i.e. it does not return a 0 filled matrix within a few seconds.... it has
been running on all my cores for a few minutes now, which typically
indicates success).

Will try to run your PR (if I figure how to do it).


Reply to this email directly or view it on GitHub
#5898 (comment).

Nathaniel J. Smith -- http://vorpus.org

@sylvainr
Copy link
Author

@njsmith thanks for your help, I gave it a shot but end up with a segfault when I try to invert a large matrix (works for small ones, cf log bellow):

$ ./runtests.py --python
Building, see build.log...
Build OK
Python 2.7.9 |Continuum Analytics, Inc.| (default, Apr 14 2015, 12:54:25) 
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
(InteractiveConsole)
>>> import numpy as np
>>> np.__version__
'1.10.0.dev0+ad4aa25'
>>> A = np.eye(1000)
>>> X = np.linalg.inv(A)
>>> X.shape
(1000, 1000)
>>> A
array([[ 1.,  0.,  0., ...,  0.,  0.,  0.],
       [ 0.,  1.,  0., ...,  0.,  0.,  0.],
       [ 0.,  0.,  1., ...,  0.,  0.,  0.],
       ..., 
       [ 0.,  0.,  0., ...,  1.,  0.,  0.],
       [ 0.,  0.,  0., ...,  0.,  1.,  0.],
       [ 0.,  0.,  0., ...,  0.,  0.,  1.]])
>>> np.abs(A).sum()
1000.0
>>> A = np.eye(10000)
>>> X
array([[ 1.,  0.,  0., ...,  0.,  0.,  0.],
       [ 0.,  1.,  0., ...,  0.,  0.,  0.],
       [ 0.,  0.,  1., ...,  0.,  0.,  0.],
       ..., 
       [ 0.,  0.,  0., ...,  1.,  0.,  0.],
       [ 0.,  0.,  0., ...,  0.,  1.,  0.],
       [ 0.,  0.,  0., ...,  0.,  0.,  1.]])
>>> np.abs(X).sum()
1000.0
>>> X = np.linalg.inv(A)
>>> np.abs(X).sum()
10000.0
>>> A = np.eye(50000)
>>> del X
>>> X = np.linalg.inv(A)
Segmentation fault

@sylvainr sylvainr changed the title Numpy/MKL inverse of very large matrix returns all-zero matrix without error Numpy inverse of very large matrix returns all-zero matrix without error May 21, 2015
@njsmith
Copy link
Member
njsmith commented May 21, 2015

@ovillellas @pv: any ideas?

On Wed, May 20, 2015 at 10:00 PM, Sylvain R notifications@github.com
wrote:

@njsmith https://github.com/njsmith thanks for your help, I gave it a
shot but end up with a segfault when I try to invert a large matrix (works
for small ones, cf log bellow):

$ ./runtests.py --python
Building, see build.log...
Build OK
Python 2.7.9 |Continuum Analytics, Inc.| (default, Apr 14 2015, 12:54:25)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
(InteractiveConsole)

import numpy as np
np.version
'1.10.0.dev0+ad4aa25'
A = np.eye(1000)
X = np.linalg.inv(A)
X.shape
(1000, 1000)
A
array([[ 1., 0., 0., ..., 0., 0., 0.],
[ 0., 1., 0., ..., 0., 0., 0.],
[ 0., 0., 1., ..., 0., 0., 0.],
...,
[ 0., 0., 0., ..., 1., 0., 0.],
[ 0., 0., 0., ..., 0., 1., 0.],
[ 0., 0., 0., ..., 0., 0., 1.]])
np.abs(A).sum()
1000.0
A = np.eye(10000)
X
array([[ 1., 0., 0., ..., 0., 0., 0.],
[ 0., 1., 0., ..., 0., 0., 0.],
[ 0., 0., 1., ..., 0., 0., 0.],
...,
[ 0., 0., 0., ..., 1., 0., 0.],
[ 0., 0., 0., ..., 0., 1., 0.],
[ 0., 0., 0., ..., 0., 0., 1.]])
np.abs(X).sum()
1000.0
X = np.linalg.inv(A)
np.abs(X).sum()
10000.0
A = np.eye(50000)
del X
X = np.linalg.inv(A)
Segmentation fault


Reply to this email directly or view it on GitHub
#5898 (comment).

Nathaniel J. Smith -- http://vorpus.org

@argriffing
Copy link
Contributor
>>> X = np.linalg.inv(A)
Segmentation fault

lol. I guess the PR didn't completely fix the problem.

Does this mean it's hitting a similar problem somewhere else, maybe within the gesv call itself? Here's a MAGMA thread that mentions this kind of limit in LAPACK/BLAS implementations. The LP64 vs. ILP64 terminology in that thread refers to the data model. There's an MKL page related to this distinction, so maybe it would be possible to build an MKL BLAS/LAPACK that would not segfault, assuming the error is not within the numpy code?

Edit: Here's a blog post mentioning numpy, scipy, MKL, and the ILP64 interface.

@charris
Copy link
Member
charris commented May 21, 2015

Note that numpy dot also has the problem. It is possible to fix it there by avoiding cblas for huge arrays.

@njsmith
Copy link
Member
njsmith commented May 21, 2015

The way it works for the CBLAS API is terrible, and I think LAPACK may
follow the same model. IIUC the size arguments to dgemm or whatever always
have type 'int', i.e. they are 31 bits, and this is an immutable fact about
the standard APIs regardless of whether you are on a 32- or 64-bit system.
.
There are non standard ways of working around this, but you have to be
careful. E.g. it is sometimes possible to compile a blas library to use
64-bit arguments, but you have to also change the function names if you do
this, because otherwise you risk other code that gets linked into your
process and uses blas finding the 64-bit symbols where it was expecting
32-bit ones and crashing. Since upstream thinks that the only people who
use blas/lapack are people building one off supercomputer programs in an
isolated environment, they don't think such collisions are a problem...
.
I don't know if mkl has a way to handle 64-bit integer inputs.
.
What does scipy do here that lets it work without crashing? Or similarly
for the old numpy code? I guess it must be possible to pull off somehow...
On May 21, 2015 7:29 AM, "argriffing" notifications@github.com wrote:

X = np.linalg.inv(A)
Segmentation fault

lol. I guess the PR didn't completely fix the problem.

Does this mean it's hitting a similar problem somewhere else, maybe within
the gesv call itself? Here's a MAGMA thread
http://icl.cs.utk.edu/magma/forum/viewtopic.php?f=2&t=1067#p2935 that
mentions this kind of limit in LAPACK/BLAS implementations. The LP64 vs.
ILP64 terminology in that thread refers to the data model
http://en.wikipedia.org/wiki/64-bit_computing#64-bit_data_models.
There's an MKL page https://software.intel.com/en-us/node/528682
related to this distinction, so maybe it would be possible to build an MKL
BLAS/LAPACK that would not segfault, assuming the error is not within the
numpy code?


Reply to this email directly or view it on GitHub
#5898 8000 (comment).

@argriffing
Copy link
Contributor

@sylvainr What if you use the PR that segfaulted, but also use MKL (without fiddling with LP64/ILP64)? The idea is that if MKL is implemented more intelligently than the GESV implementation that is hardcoded into numpy, then it could be possible that the MKL implementation could invert a large nxn matrix without segfaulting if it is careful about not overflowing integers internally, even if it uses a 32 bit integer to specify the number n in the interface (n does not overflow but n*n overflows).

@argriffing
Copy link
Contributor

For what it's worth, I was able to inv and cholesky_lo an NxN float32 identity matrix with N=46341 with my PR without segfaulting, using code like np.linalg._umath_linalg.cholesky_lo(a, signature='f->f') even though I have only a couple dozen gigabytes of RAM. That hack was required for me because otherwise numpy wants to convert float32->float64 before doing linear algebra, and I don't have enough RAM for that. I did not try N as large as 50000. I think I was using openblas.

Edit: Here's an open issue related to the automatic float32->float64 conversion: #3722

@argriffing
Copy link
Contributor

It looks like the numpy linalg code is currently under construction (in a stronger sense than that all open source software is always under construction), with open questions in #3217 after having merged the PR #3220 that appears to have broken np.linalg.inv for huge matrices.

@pv
Copy link
Member
pv commented May 21, 2015

All the open questions for numpy linalg are related to extending the API, and fixing legacy behavior. For example, the cast to float64 is legacy behavior, and changing that would break backward compat so it was purposefully not done. So nothing out of the ordinary here, IMHO. The part that is merged is complete.

@sylvainr
Copy link
Author

@argriffing I wont be able to build numpy with MKL as I don't have MKL's SDK. I have been running the MKL methods through anaconda so far.

@pv
Copy link
Member
pv commented May 21, 2015

@sylvainr: so to summarize --- the status is that it works with Scipy from Anaconda, which is linked against MKL. However, the Numpy you build is linked against the BLAS/LAPACK library that happens to be installed (check numpy.__config__.show())? The blas_lite.c & lapack_lite.c that Numpy uses if no system BLAS is available have obvious integer overflow issues (f2c generated code, uses C int for flat indices).

@sylvainr
Copy link
Author

@pv That should answer your question:

$ ./runtests.py --python
Building, see build.log...
Build OK
Python 2.7.9 |Continuum Analytics, Inc.| (default, Apr 14 2015, 12:54:25) 
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
(InteractiveConsole)
>>> import numpy as np
>>> np.__config__.show()
lapack_info:
  NOT AVAILABLE
lapack_opt_info:
  NOT AVAILABLE
openblas_lapack_info:
  NOT AVAILABLE
blas_info:
  NOT AVAILABLE
atlas_3_10_blas_threads_info:
  NOT AVAILABLE
atlas_threads_info:
  NOT AVAILABLE
blas_src_info:
  NOT AVAILABLE
atlas_3_10_threads_info:
  NOT AVAILABLE
atlas_blas_info:
  NOT AVAILABLE
atlas_3_10_blas_info:
  NOT AVAILABLE
lapack_src_info:
  NOT AVAILABLE
atlas_blas_threads_info:
  NOT AVAILABLE
openblas_info:
  NOT AVAILABLE
blas_mkl_info:
  NOT AVAILABLE
blas_opt_info:
  NOT AVAILABLE
atlas_info:
  NOT AVAILABLE
atlas_3_10_info:
  NOT AVAILABLE
lapack_mkl_info:
  NOT AVAILABLE
mkl_info:
  NOT AVAILABLE

Looks like it may be using the buggy blas_lite.c.

I am not familiar as to how to tell Numpy which backend lib to use (note that I don't have anyone installed). I am on Ubuntu, should installing libblas-dev and libblas3 packages enough?

@pv
Copy link
Member
pv commented May 21, 2015

@sylvainr: and for the Numpy from Anaconda, does it says it's using MKL? On ubuntu, you can install libopenblas-dev liblapack-dev.

@sylvainr
Copy link
Author

Do I need to do some sort of make clean before? I did a python setup.py clean before running the runtest script.

I am still getting a seg fault.

$ ./runtests.py --python
Building, see build.log...
Build OK
Python 2.7.9 |Continuum Analytics, Inc.| (default, Apr 14 2015, 12:54:25) 
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
(InteractiveConsole)
>>> import numpy
>>> numpy.__config__.show()
lapack_info:
  NOT AVAILABLE
atlas_threads_info:
  NOT AVAILABLE
blas_opt_info:
    libraries = ['openblas']
    library_dirs = ['/usr/lib']
    define_macros = [('HAVE_CBLAS', None)]
    language = c
lapack_src_info:
  NOT AVAILABLE
openblas_info:
    libraries = ['openblas']
    library_dirs = ['/usr/lib']
    define_macros = [('HAVE_CBLAS', None)]
    language = c
lapack_opt_info:
  NOT AVAILABLE
openblas_lapack_info:
  NOT AVAILABLE
lapack_mkl_info:
  NOT AVAILABLE
atlas_3_10_threads_info:
  NOT AVAILABLE
atlas_info:
  NOT AVAILABLE
atlas_3_10_info:
  NOT AVAILABLE
blas_mkl_info:
  NOT AVAILABLE
mkl_info:
  NOT AVAILABLE
>>> X = np.linalg.inv(np.eye(50000))
Traceback (most recent call last):
  File "<console>", line 1, in <module>
NameError: name 'np' is not defined
>>> import numpy as np
>>> X = np.linalg.inv(np.eye(50000))
Segmentation fault

@sylvainr
Copy link
Author

This is what I get from conda's python:

~/miniconda/bin$ ./python
Python 2.7.9 |Continuum Analytics, Inc.| (default, Apr 14 2015, 12:54:25) 
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
Anaconda is brought to you by Continuum Analytics.
Please check out: http://continuum.io/thanks and https://binstar.org
>>> import numpy as np
>>> np.__config__.show()
lapack_opt_info:
    libraries = ['mkl_lapack95_lp64', 'mkl_intel_lp64', 'mkl_intel_thread', 'mkl_core', 'iomp5', 'pthread']
    library_dirs = ['/home/toto/miniconda/lib']
    define_macros = [('SCIPY_MKL_H', None)]
    include_dirs = ['/home/toto/miniconda/include']
blas_opt_info:
    libraries = ['mkl_intel_lp64', 'mkl_intel_thread', 'mkl_core', 'iomp5', 'pthread']
    library_dirs = ['/home/toto/miniconda/lib']
    define_macros = [('SCIPY_MKL_H', None)]
    include_dirs = ['/home/toto/miniconda/include']
openblas_lapack_info:
  NOT AVAILABLE
lapack_mkl_info:
    libraries = ['mkl_lapack95_lp64', 'mkl_intel_lp64', 'mkl_intel_thread', 'mkl_core', 'iomp5', 'pthread']
    library_dirs = ['/home/toto/miniconda/lib']
    define_macros = [('SCIPY_MKL_H', None)]
    include_dirs = ['/home/toto/miniconda/include']
blas_mkl_info:
    libraries = ['mkl_intel_lp64', 'mkl_intel_thread', 'mkl_core', 'iomp5', 'pthread']
    library_dirs = ['/home/toto/miniconda/lib']
    define_macros = [('SCIPY_MKL_H', None)]
    include_dirs = ['/home/toto/miniconda/include']
mkl_info:
    libraries = ['mkl_intel_lp64', 'mkl_intel_thread', 'mkl_core', 'iomp5', 'pthread']
    library_dirs = ['/home/toto/miniconda/lib']
    define_macros = [('SCIPY_MKL_H', None)]
    include_dirs = ['/home/toto/miniconda/include']
>>> 

@pv
Copy link
Member
pv commented May 21, 2015

@sylvainr: looks like you are missing lapack_opt_info from the output. Maybe try installing liblapack-dev and try again? Removing the build/ directory should be enough cleanup.

@sylvainr
Copy link
Author

Just did that, looks like it is not seg faulting right away this time! I even have all my cores cranking up, indicating it's computing.

My config now:

>>> np.__config__.show()
blas_info:
    libraries = ['blas']
    library_dirs = ['/usr/lib']
    define_macros = [('HAVE_CBLAS', None)]
    language = c
lapack_info:
    libraries = ['lapack']
    library_dirs = ['/usr/lib']
    language = f77
atlas_threads_info:
  NOT AVAILABLE
blas_opt_info:
    libraries = ['openblas']
    library_dirs = ['/usr/lib']
    define_macros = [('HAVE_CBLAS', None)]
    language = c
openblas_info:
    libraries = ['openblas']
    library_dirs = ['/usr/lib']
    define_macros = [('HAVE_CBLAS', None)]
    language = c
lapack_opt_info:
    libraries = ['lapack', 'blas']
    library_dirs = ['/usr/lib']
    language = c
    define_macros = [('NO_ATLAS_INFO', 1), ('HAVE_CBLAS', None)]

@sylvainr
Copy link
Author

90 mins later, I confirm that it worked:

>>> X = np.linalg.inv(np.eye(50000))
>>> X
array([[ 1.,  0.,  0., ...,  0.,  0.,  0.],
       [ 0.,  1.,  0., ...,  0.,  0.,  0.],
       [ 0.,  0.,  1., ...,  0.,  0.,  0.],
       ..., 
       [ 0.,  0.,  0., ...,  1.,  0.,  0.],
       [ 0.,  0.,  0., ...,  0.,  1.,  0.],
       [ 0.,  0.,  0., ...,  0.,  0.,  1.]])
>>> np.abs(X).sum()
50000.0

@argriffing
Copy link
Contributor

Note that numpy dot also has the problem. It is possible to fix it there by avoiding cblas for huge arrays.

Indeed, here's an active issue scikit-learn/scikit-learn#4197 where dot involving a 50000 x 50000 matrix is segfaulting and surprising people.

The corresponding numpy issue is #5533. In that thread you can see that I was unable to reproduce the segfault using OpenBLAS, but it was reported to have segfaulted with the Accelerate implementation of BLAS/LAPACK (mac).

@mattip
Copy link
Member
mattip commented Aug 18, 2019

Duplicate of #5533, #5906, and perhaps others. Closing since #5533 suggests some solutions to the 32-bit limit in LAPACK routines.

@mattip mattip closed this as completed Aug 18, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants
0