-
-
Notifications
You must be signed in to change notification settings - Fork 10.9k
Numpy inverse of very large matrix returns all-zero matrix without error #5898
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Just tested with the MKL-free numpy and have the same exact problem and behavior.
I've also tested with conda's numpy 1.8.2: same problem. Then I tried with ubuntu Ubuntu 14.04.2 LTS's python+numpy (apt-get that installed python-numpy 1:1.8.2-0ubuntu0.1): I have the same exact problem every time. |
Is |
It appears that with N=46000 it works, but it starts failing with N=47000. A gentleman on IRC suggested to use scipy's linalg.inv function and see if that worked. It appears to be working (at least it does not return a full zero matrix within a few seconds). Still need to verify the output makes sense. |
Does it work with N=46335 but fail with N=46345? Sounds like a bug in On Wed, May 20, 2015 at 3:05 PM, Sylvain R notifications@github.com wrote:
Nathaniel J. Smith -- http://vorpus.org |
If I were a gambler I might bet it has something to do with this: >>> 46340**2 < 1<<31 < 46341**2
True |
^what @njsmith said |
@argriffing: you are spot on, it fails with Definitely looks like a 32 bit variable bug! For now I'll use the |
I'm also curious if |
@argriffing: I reverted my numpy to numpy-1.6.2 (thanks conda) and ran my test script. Looks like it works (i.e. it does not return a 0 filled matrix within a few seconds.... it has been running on all my cores for a few minutes now, which typically indicates success). Will try to run your PR (if I figure how to do it). |
If you have a working dev environment, then something like this: On Wed, May 20, 2015 at 9:24 PM, Sylvain R notifications@github.com wrote:
Nathaniel J. Smith -- http://vorpus.org |
@njsmith thanks for your help, I gave it a shot but end up with a segfault when I try to invert a large matrix (works for small ones, cf log bellow):
|
@ovillellas @pv: any ideas? On Wed, May 20, 2015 at 10:00 PM, Sylvain R notifications@github.com
Nathaniel J. Smith -- http://vorpus.org |
>>> X = np.linalg.inv(A)
Segmentation fault lol. I guess the PR didn't completely fix the problem. Does this mean it's hitting a similar problem somewhere else, maybe within the gesv call itself? Here's a MAGMA thread that mentions this kind of limit in LAPACK/BLAS implementations. The LP64 vs. ILP64 terminology in that thread refers to the data model. There's an MKL page related to this distinction, so maybe it would be possible to build an MKL BLAS/LAPACK that would not segfault, assuming the error is not within the numpy code? Edit: Here's a blog post mentioning numpy, scipy, MKL, and the ILP64 interface. |
Note that numpy dot also has the problem. It is possible to fix it there by avoiding cblas for huge arrays. |
The way it works for the CBLAS API is terrible, and I think LAPACK may
|
@sylvainr What if you use the PR that segfaulted, but also use MKL (without fiddling with LP64/ILP64)? The idea is that if MKL is implemented more intelligently than the GESV implementation that is hardcoded into numpy, then it could be possible that the MKL implementation could invert a large nxn matrix without segfaulting if it is careful about not overflowing integers internally, even if it uses a 32 bit integer to specify the number |
For what it's worth, I was able to Edit: Here's an open issue related to the automatic float32->float64 conversion: #3722 |
All the open questions for numpy linalg are related to extending the API, and fixing legacy behavior. For example, the cast to float64 is legacy behavior, and changing that would break backward compat so it was purposefully not done. So nothing out of the ordinary here, IMHO. The part that is merged is complete. |
@argriffing I wont be able to build numpy with MKL as I don't have MKL's SDK. I have been running the MKL methods through anaconda so far. |
@sylvainr: so to summarize --- the status is that it works with Scipy from Anaconda, which is linked against MKL. However, the Numpy you build is linked against the BLAS/LAPACK library that happens to be installed (check |
@pv That should answer your question:
Looks like it may be using the buggy I am not familiar as to how to tell Numpy which backend lib to use (note that I don't have anyone installed). I am on Ubuntu, should installing |
@sylvainr: and for the Numpy from Anaconda, does it says it's using MKL? On ubuntu, you can install |
Do I need to do some sort of I am still getting a seg fault.
|
This is what I get from conda's python:
|
@sylvainr: looks like you are missing |
Just did that, looks like it is not seg faulting right away this time! I even have all my cores cranking up, indicating it's computing. My config now:
|
90 mins later, I confirm that it worked:
|
Indeed, here's an active issue scikit-learn/scikit-learn#4197 where The corresponding numpy issue is #5533. In that thread you can see that I was unable to reproduce the segfault using OpenBLAS, but it was reported to have segfaulted with the Accelerate implementation of BLAS/LAPACK (mac). |
I have been experimenting with large matrix inversions (have to inverse the whole matrix is my specific case) to check the runtime. It all works well until I tried to inverse a 50000 by 50000 matrix:
The run time of ~6 seconds is very small. But more importantly: I should not have the inv() function to return an all-zero matrix without any error!
Am I missing something here? Is it a result I should be expecting?
About my environment:
Right before running the snippet above, I made sure I had enough RAM:
Running Conda 3.12.0 with:
numpy 1.9.2 py27_p0 [mkl]
OS: Linux seftrtools1 3.16.0-34-generic #47~14.04.1-Ubuntu SMP Fri Apr 10 17:49:16 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
The text was updated successfully, but these errors were encountered: