Numpy inverse of very large matrix returns all-zero matrix without error #5898

sylvainr · 2015-05-20T21:23:47Z

I have been experimenting with large matrix inversions (have to inverse the whole matrix is my specific case) to check the runtime. It all works well until I tried to inverse a 50000 by 50000 matrix:

In [10]: A = np.eye(50000)

In [11]: A
Out[11]: 
array([[ 1.,  0.,  0., ...,  0.,  0.,  0.],
       [ 0.,  1.,  0., ...,  0.,  0.,  0.],
       [ 0.,  0.,  1., ...,  0.,  0.,  0.],
       ..., 
       [ 0.,  0.,  0., ...,  1.,  0.,  0.],
       [ 0.,  0.,  0., ...,  0.,  1.,  0.],
       [ 0.,  0.,  0., ...,  0.,  0.,  1.]])

In [12]: %time X = np.linalg.inv(A)
CPU times: user 2.95 s, sys: 3.24 s, total: 6.19 s
Wall time: 6.19 s

In [13]: X.shape
Out[13]: (50000, 50000)

In [14]: np.abs(X).sum()
Out[14]: 0.0

The run time of ~6 seconds is very small. But more importantly: I should not have the inv() function to return an all-zero matrix without any error!

Am I missing something here? Is it a result I should be expecting?

About my environment:

Right before running the snippet above, I made sure I had enough RAM:

$ cat /proc/meminfo | grep MemFree
MemFree:        138568436 kB

Running Conda 3.12.0 with:
numpy 1.9.2 py27_p0 [mkl]

OS: Linux seftrtools1 3.16.0-34-generic #47~14.04.1-Ubuntu SMP Fri Apr 10 17:49:16 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

$ python --version
Python 2.7.6

$ ulimit
unlimited

The text was updated successfully, but these errors were encountered:

sylvainr · 2015-05-20T21:31:29Z

Just tested with the MKL-free numpy and have the same exact problem and behavior.

$ conda list | grep numpy
numpy                     1.9.2                    py27_0

I've also tested with conda's numpy 1.8.2: same problem.

Then I tried with ubuntu Ubuntu 14.04.2 LTS's python+numpy (apt-get that installed python-numpy 1:1.8.2-0ubuntu0.1): I have the same exact problem every time.

argriffing · 2015-05-20T21:46:28Z

Is N=50000 the smallest value of N where you see this behavior?

sylvainr · 2015-05-20T22:05:12Z

It appears that with N=46000 it works, but it starts failing with N=47000.

A gentleman on IRC suggested to use scipy's linalg.inv function and see if that worked. It appears to be working (at least it does not return a full zero matrix within a few seconds). Still need to verify the output makes sense.

njsmith · 2015-05-20T22:13:45Z

Does it work with N=46335 but fail with N=46345? Sounds like a bug in
handling matrices that are too big for BLAS's 32-bit indices...

On Wed, May 20, 2015 at 3:05 PM, Sylvain R notifications@github.com wrote:

It appears that with N=46000 it works, but it starts failing with N=47000.

A gentleman on IRC suggested to use scipy's linalg.inv function and see if
that worked. It appears to be working (at least it does not return a full
zero matrix within a few seconds). Still need to verify the output makes
sense.

—
Reply to this email directly or view it on GitHub
#5898 (comment).

Nathaniel J. Smith -- http://vorpus.org

argriffing · 2015-05-20T22:13:47Z

It appears that with N=46000 it works, but it starts failing with N=47000.

If I were a gambler I might bet it has something to do with this:

>>> 46340**2 < 1<<31 < 46341**2
True

argriffing · 2015-05-20T22:14:36Z

^what @njsmith said

sylvainr · 2015-05-20T23:29:20Z

@argriffing: you are spot on, it fails with N=46341, but works with N=46340. Just as you predicted.

Definitely looks like a 32 bit variable bug!

For now I'll use the scipy.linalg.inv() method.

argriffing · 2015-05-21T01:57:11Z

@sylvainr I've opened the PR #5899 but I don't have enough RAM to check it.

argriffing · 2015-05-21T02:19:28Z

I'm also curious if np.linalg.inv may have worked with larger inputs in version 1.7 and earlier (before the generalized ufuncs PR #3220).

sylvainr · 2015-05-21T04:24:44Z

@argriffing: I reverted my numpy to numpy-1.6.2 (thanks conda) and ran my test script. Looks like it works (i.e. it does not return a 0 filled matrix within a few seconds.... it has been running on all my cores for a few minutes now, which typically indicates success).

Will try to run your PR (if I figure how to do it).

njsmith · 2015-05-21T04:44:36Z

If you have a working dev environment, then something like this:
git clone https://github.com/argriffing/numpy.git
cd numpy
git checkout improve-umath-linalg
./runtests.py --ipython
should give you an ipython shell whose numpy will contain @agriffing's
proposed changes. (Check numpy.version to be sure.)

On Wed, May 20, 2015 at 9:24 PM, Sylvain R notifications@github.com wrote:

@argriffing https://github.com/argriffing: I reverted my numpy to
numpy-1.6.2 (thanks conda) and ran my test script. Looks like it works
(i.e. it does not return a 0 filled matrix within a few seconds.... it has
been running on all my cores for a few minutes now, which typically
indicates success).

Will try to run your PR (if I figure how to do it).

—
Reply to this email directly or view it on GitHub
#5898 (comment).

Nathaniel J. Smith -- http://vorpus.org

sylvainr · 2015-05-21T05:00:07Z

@njsmith thanks for your help, I gave it a shot but end up with a segfault when I try to invert a large matrix (works for small ones, cf log bellow):

$ ./runtests.py --python
Building, see build.log...
Build OK
Python 2.7.9 |Continuum Analytics, Inc.| (default, Apr 14 2015, 12:54:25) 
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
(InteractiveConsole)
>>> import numpy as np
>>> np.__version__
'1.10.0.dev0+ad4aa25'
>>> A = np.eye(1000)
>>> X = np.linalg.inv(A)
>>> X.shape
(1000, 1000)
>>> A
array([[ 1.,  0.,  0., ...,  0.,  0.,  0.],
       [ 0.,  1.,  0., ...,  0.,  0.,  0.],
       [ 0.,  0.,  1., ...,  0.,  0.,  0.],
       ..., 
       [ 0.,  0.,  0., ...,  1.,  0.,  0.],
       [ 0.,  0.,  0., ...,  0.,  1.,  0.],
       [ 0.,  0.,  0., ...,  0.,  0.,  1.]])
>>> np.abs(A).sum()
1000.0
>>> A = np.eye(10000)
>>> X
array([[ 1.,  0.,  0., ...,  0.,  0.,  0.],
       [ 0.,  1.,  0., ...,  0.,  0.,  0.],
       [ 0.,  0.,  1., ...,  0.,  0.,  0.],
       ..., 
       [ 0.,  0.,  0., ...,  1.,  0.,  0.],
       [ 0.,  0.,  0., ...,  0.,  1.,  0.],
       [ 0.,  0.,  0., ...,  0.,  0.,  1.]])
>>> np.abs(X).sum()
1000.0
>>> X = np.linalg.inv(A)
>>> np.abs(X).sum()
10000.0
>>> A = np.eye(50000)
>>> del X
>>> X = np.linalg.inv(A)
Segmentation fault

njsmith · 2015-05-21T05:12:59Z

@ovillellas @pv: any ideas?

On Wed, May 20, 2015 at 10:00 PM, Sylvain R notifications@github.com
wrote:

@njsmith https://github.com/njsmith thanks for your help, I gave it a
shot but end up with a segfault when I try to invert a large matrix (works
for small ones, cf log bellow):

$ ./runtests.py --python
Building, see build.log...
Build OK
Python 2.7.9 |Continuum Analytics, Inc.| (default, Apr 14 2015, 12:54:25)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
(InteractiveConsole)

import numpy as np
np.version
'1.10.0.dev0+ad4aa25'
A = np.eye(1000)
X = np.linalg.inv(A)
X.shape
(1000, 1000)
A
array([[ 1., 0., 0., ..., 0., 0., 0.],
[ 0., 1., 0., ..., 0., 0., 0.],
[ 0., 0., 1., ..., 0., 0., 0.],
...,
[ 0., 0., 0., ..., 1., 0., 0.],
[ 0., 0., 0., ..., 0., 1., 0.],
[ 0., 0., 0., ..., 0., 0., 1.]])
np.abs(A).sum()
1000.0
A = np.eye(10000)
X
array([[ 1., 0., 0., ..., 0., 0., 0.],
[ 0., 1., 0., ..., 0., 0., 0.],
[ 0., 0., 1., ..., 0., 0., 0.],
...,
[ 0., 0., 0., ..., 1., 0., 0.],
[ 0., 0., 0., ..., 0., 1., 0.],
[ 0., 0., 0., ..., 0., 0., 1.]])
np.abs(X).sum()
1000.0
X = np.linalg.inv(A)
np.abs(X).sum()
10000.0
A = np.eye(50000)
del X
X = np.linalg.inv(A)
Segmentation fault

—
Reply to this email directly or view it on GitHub
#5898 (comment).

Nathaniel J. Smith -- http://vorpus.org

argriffing · 2015-05-21T14:29:36Z

>>> X = np.linalg.inv(A)
Segmentation fault

lol. I guess the PR didn't completely fix the problem.

Does this mean it's hitting a similar problem somewhere else, maybe within the gesv call itself? Here's a MAGMA thread that mentions this kind of limit in LAPACK/BLAS implementations. The LP64 vs. ILP64 terminology in that thread refers to the data model. There's an MKL page related to this distinction, so maybe it would be possible to build an MKL BLAS/LAPACK that would not segfault, assuming the error is not within the numpy code?

Edit: Here's a blog post mentioning numpy, scipy, MKL, and the ILP64 interface.

charris · 2015-05-21T15:15:41Z

Note that numpy dot also has the problem. It is possible to fix it there by avoiding cblas for huge arrays.

njsmith · 2015-05-21T15:17:11Z

The way it works for the CBLAS API is terrible, and I think LAPACK may
follow the same model. IIUC the size arguments to dgemm or whatever always
have type 'int', i.e. they are 31 bits, and this is an immutable fact about
the standard APIs regardless of whether you are on a 32- or 64-bit system.
.
There are non standard ways of working around this, but you have to be
careful. E.g. it is sometimes possible to compile a blas library to use
64-bit arguments, but you have to also change the function names if you do
this, because otherwise you risk other code that gets linked into your
process and uses blas finding the 64-bit symbols where it was expecting
32-bit ones and crashing. Since upstream thinks that the only people who
use blas/lapack are people building one off supercomputer programs in an
isolated environment, they don't think such collisions are a problem...
.
I don't know if mkl has a way to handle 64-bit integer inputs.
.
What does scipy do here that lets it work without crashing? Or similarly
for the old numpy code? I guess it must be possible to pull off somehow...
On May 21, 2015 7:29 AM, "argriffing" notifications@github.com wrote:

X = np.linalg.inv(A)
Segmentation fault

lol. I guess the PR didn't completely fix the problem.

Does this mean it's hitting a similar problem somewhere else, maybe within
the gesv call itself? Here's a MAGMA thread
http://icl.cs.utk.edu/magma/forum/viewtopic.php?f=2&t=1067#p2935 that
mentions this kind of limit in LAPACK/BLAS implementations. The LP64 vs.
ILP64 terminology in that thread refers to the data model
http://en.wikipedia.org/wiki/64-bit_computing#64-bit_data_models.
There's an MKL page https://software.intel.com/en-us/node/528682
related to this distinction, so maybe it would be possible to build an MKL
BLAS/LAPACK that would not segfault, assuming the error is not within the
numpy code?

—
Reply to this email directly or view it on GitHub
#5898 8000 (comment).

argriffing · 2015-05-21T15:47:39Z

@sylvainr What if you use the PR that segfaulted, but also use MKL (without fiddling with LP64/ILP64)? The idea is that if MKL is implemented more intelligently than the GESV implementation that is hardcoded into numpy, then it could be possible that the MKL implementation could invert a large nxn matrix without segfaulting if it is careful about not overflowing integers internally, even if it uses a 32 bit integer to specify the number n in the interface (n does not overflow but n*n overflows).

argriffing · 2015-05-21T16:04:01Z

For what it's worth, I was able to inv and cholesky_lo an NxN float32 identity matrix with N=46341 with my PR without segfaulting, using code like np.linalg._umath_linalg.cholesky_lo(a, signature='f->f') even though I have only a couple dozen gigabytes of RAM. That hack was required for me because otherwise numpy wants to convert float32->float64 before doing linear algebra, and I don't have enough RAM for that. I did not try N as large as 50000. I think I was using openblas.

Edit: Here's an open issue related to the automatic float32->float64 conversion: #3722

argriffing · 2015-05-21T16:14:22Z

It looks like the numpy linalg code is currently under construction (in a stronger sense than that all open source software is always under construction), with open questions in #3217 after having merged the PR #3220 that appears to have broken np.linalg.inv for huge matrices.

pv · 2015-05-21T16:28:27Z

All the open questions for numpy linalg are related to extending the API, and fixing legacy behavior. For example, the cast to float64 is legacy behavior, and changing that would break backward compat so it was purposefully not done. So nothing out of the ordinary here, IMHO. The part that is merged is complete.

sylvainr · 2015-05-21T17:01:36Z

@argriffing I wont be able to build numpy with MKL as I don't have MKL's SDK. I have been running the MKL methods through anaconda so far.

pv · 2015-05-21T17:15:43Z

@sylvainr: so to summarize --- the status is that it works with Scipy from Anaconda, which is linked against MKL. However, the Numpy you build is linked against the BLAS/LAPACK library that happens to be installed (check numpy.__config__.show())? The blas_lite.c & lapack_lite.c that Numpy uses if no system BLAS is available have obvious integer overflow issues (f2c generated code, uses C int for flat indices).

sylvainr · 2015-05-21T17:20:58Z

@pv That should answer your question:

$ ./runtests.py --python
Building, see build.log...
Build OK
Python 2.7.9 |Continuum Analytics, Inc.| (default, Apr 14 2015, 12:54:25) 
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
(InteractiveConsole)
>>> import numpy as np
>>> np.__config__.show()
lapack_info:
  NOT AVAILABLE
lapack_opt_info:
  NOT AVAILABLE
openblas_lapack_info:
  NOT AVAILABLE
blas_info:
  NOT AVAILABLE
atlas_3_10_blas_threads_info:
  NOT AVAILABLE
atlas_threads_info:
  NOT AVAILABLE
blas_src_info:
  NOT AVAILABLE
atlas_3_10_threads_info:
  NOT AVAILABLE
atlas_blas_info:
  NOT AVAILABLE
atlas_3_10_blas_info:
  NOT AVAILABLE
lapack_src_info:
  NOT AVAILABLE
atlas_blas_threads_info:
  NOT AVAILABLE
openblas_info:
  NOT AVAILABLE
blas_mkl_info:
  NOT AVAILABLE
blas_opt_info:
  NOT AVAILABLE
atlas_info:
  NOT AVAILABLE
atlas_3_10_info:
  NOT AVAILABLE
lapack_mkl_info:
  NOT AVAILABLE
mkl_info:
  NOT AVAILABLE

Looks like it may be using the buggy blas_lite.c.

I am not familiar as to how to tell Numpy which backend lib to use (note that I don't have anyone installed). I am on Ubuntu, should installing libblas-dev and libblas3 packages enough?

pv · 2015-05-21T17:24:25Z

@sylvainr: and for the Numpy from Anaconda, does it says it's using MKL? On ubuntu, you can install libopenblas-dev liblapack-dev.

sylvainr · 2015-05-21T17:31:02Z

Do I need to do some sort of make clean before? I did a python setup.py clean before running the runtest script.

I am still getting a seg fault.

$ ./runtests.py --python
Building, see build.log...
Build OK
Python 2.7.9 |Continuum Analytics, Inc.| (default, Apr 14 2015, 12:54:25) 
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
(InteractiveConsole)
>>> import numpy
>>> numpy.__config__.show()
lapack_info:
  NOT AVAILABLE
atlas_threads_info:
  NOT AVAILABLE
blas_opt_info:
    libraries = ['openblas']
    library_dirs = ['/usr/lib']
    define_macros = [('HAVE_CBLAS', None)]
    language = c
lapack_src_info:
  NOT AVAILABLE
openblas_info:
    libraries = ['openblas']
    library_dirs = ['/usr/lib']
    define_macros = [('HAVE_CBLAS', None)]
    language = c
lapack_opt_info:
  NOT AVAILABLE
openblas_lapack_info:
  NOT AVAILABLE
lapack_mkl_info:
  NOT AVAILABLE
atlas_3_10_threads_info:
  NOT AVAILABLE
atlas_info:
  NOT AVAILABLE
atlas_3_10_info:
  NOT AVAILABLE
blas_mkl_info:
  NOT AVAILABLE
mkl_info:
  NOT AVAILABLE
>>> X = np.linalg.inv(np.eye(50000))
Traceback (most recent call last):
  File "<console>", line 1, in <module>
NameError: name 'np' is not defined
>>> import numpy as np
>>> X = np.linalg.inv(np.eye(50000))
Segmentation fault

sylvainr · 2015-05-21T17:33:14Z

This is what I get from conda's python:

~/miniconda/bin$ ./python
Python 2.7.9 |Continuum Analytics, Inc.| (default, Apr 14 2015, 12:54:25) 
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
Anaconda is brought to you by Continuum Analytics.
Please check out: http://continuum.io/thanks and https://binstar.org
>>> import numpy as np
>>> np.__config__.show()
lapack_opt_info:
    libraries = ['mkl_lapack95_lp64', 'mkl_intel_lp64', 'mkl_intel_thread', 'mkl_core', 'iomp5', 'pthread']
    library_dirs = ['/home/toto/miniconda/lib']
    define_macros = [('SCIPY_MKL_H', None)]
    include_dirs = ['/home/toto/miniconda/include']
blas_opt_info:
    libraries = ['mkl_intel_lp64', 'mkl_intel_thread', 'mkl_core', 'iomp5', 'pthread']
    library_dirs = ['/home/toto/miniconda/lib']
    define_macros = [('SCIPY_MKL_H', None)]
    include_dirs = ['/home/toto/miniconda/include']
openblas_lapack_info:
  NOT AVAILABLE
lapack_mkl_info:
    libraries = ['mkl_lapack95_lp64', 'mkl_intel_lp64', 'mkl_intel_thread', 'mkl_core', 'iomp5', 'pthread']
    library_dirs = ['/home/toto/miniconda/lib']
    define_macros = [('SCIPY_MKL_H', None)]
    include_dirs = ['/home/toto/miniconda/include']
blas_mkl_info:
    libraries = ['mkl_intel_lp64', 'mkl_intel_thread', 'mkl_core', 'iomp5', 'pthread']
    library_dirs = ['/home/toto/miniconda/lib']
    define_macros = [('SCIPY_MKL_H', None)]
    include_dirs = ['/home/toto/miniconda/include']
mkl_info:
    libraries = ['mkl_intel_lp64', 'mkl_intel_thread', 'mkl_core', 'iomp5', 'pthread']
    library_dirs = ['/home/toto/miniconda/lib']
    define_macros = [('SCIPY_MKL_H', None)]
    include_dirs = ['/home/toto/miniconda/include']
>>>

pv · 2015-05-21T17:35:10Z

@sylvainr: looks like you are missing lapack_opt_info from the output. Maybe try installing liblapack-dev and try again? Removing the build/ directory should be enough cleanup.

sylvainr · 2015-05-21T17:44:39Z

Just did that, looks like it is not seg faulting right away this time! I even have all my cores cranking up, indicating it's computing.

My config now:

>>> np.__config__.show()
blas_info:
    libraries = ['blas']
    library_dirs = ['/usr/lib']
    define_macros = [('HAVE_CBLAS', None)]
    language = c
lapack_info:
    libraries = ['lapack']
    library_dirs = ['/usr/lib']
    language = f77
atlas_threads_info:
  NOT AVAILABLE
blas_opt_info:
    libraries = ['openblas']
    library_dirs = ['/usr/lib']
    define_macros = [('HAVE_CBLAS', None)]
    language = c
openblas_info:
    libraries = ['openblas']
    library_dirs = ['/usr/lib']
    define_macros = [('HAVE_CBLAS', None)]
    language = c
lapack_opt_info:
    libraries = ['lapack', 'blas']
    library_dirs = ['/usr/lib']
    language = c
    define_macros = [('NO_ATLAS_INFO', 1), ('HAVE_CBLAS', None)]

sylvainr · 2015-05-21T19:26:22Z

90 mins later, I confirm that it worked:

>>> X = np.linalg.inv(np.eye(50000))
>>> X
array([[ 1.,  0.,  0., ...,  0.,  0.,  0.],
       [ 0.,  1.,  0., ...,  0.,  0.,  0.],
       [ 0.,  0.,  1., ...,  0.,  0.,  0.],
       ..., 
       [ 0.,  0.,  0., ...,  1.,  0.,  0.],
       [ 0.,  0.,  0., ...,  0.,  1.,  0.],
       [ 0.,  0.,  0., ...,  0.,  0.,  1.]])
>>> np.abs(X).sum()
50000.0

argriffing · 2015-05-21T20:19:24Z

Note that numpy dot also has the problem. It is possible to fix it there by avoiding cblas for huge arrays.

Indeed, here's an active issue scikit-learn/scikit-learn#4197 where dot involving a 50000 x 50000 matrix is segfaulting and surprising people.

The corresponding numpy issue is #5533. In that thread you can see that I was unable to reproduce the segfault using OpenBLAS, but it was reported to have segfaulted with the Accelerate implementation of BLAS/LAPACK (mac).

mattip · 2019-08-18T18:18:14Z

Duplicate of #5533, #5906, and perhaps others. Closing since #5533 suggests some solutions to the 32-bit limit in LAPACK routines.

sylvainr changed the title ~~Numpy/MKL inverse of very large matrix returns all-zero matrix without error~~ Numpy inverse of very large matrix returns all-zero matrix without error May 21, 2015

jaimefrio mentioned this issue May 21, 2015

MAINT: use more conservative integer types for umath linalg #5899

Merged

njsmith mentioned this issue May 21, 2015

lapack_lite should use 64-bit integer indices #5906

Closed

seberg added the component: numpy.linalg label Jan 5, 2019

mattip closed this as completed Aug 18, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Numpy inverse of very large matrix returns all-zero matrix without error #5898

Numpy inverse of very large matrix returns all-zero matrix without error #5898

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Numpy inverse of very large matrix returns all-zero matrix without error #5898

Numpy inverse of very large matrix returns all-zero matrix without error #5898

Comments

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!