-
-
Notifications
You must be signed in to change notification settings - Fork 10.9k
numpy.dot crash with numpy.float32 input #4007
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
most likely it is a crash in the used blas library.
if its openblas, please make sure it is the latest version from git, anything older tends to crash a lot. |
It is vecLib that comes with OS X 10.9. |
@fbkarsdorp: and your Numpy version is? |
1.9.0.dev-a3e8c12 |
does it also crash with numpy 1.7 and 1.8 (make sure to also use veclib)? please also provide a backtrace with gdb:
|
OS X 10.9 was released October 22 and it could very well have library/xcode 5.0.1 problems. @fbkarsdorp What compiler was used for the numpy installation? Was xcode updated also? |
Yes, I had xcode 5.0.1 installed and I belief I used the following compiler: Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1 |
I ran the following testcase:
And this is the log: [@pv/EDIT long traceback moved to a gist: https://gist.github.com/pv/7307192 ] |
mac gdb stops at the beginning of each run, you need to type |
Just want to confirm the issue. I am in the exact same situation of fbkarsdorp, with the same configuration and versions (xcode, compiler, python and numpy) and i have the exactly same error. |
Please try to obtain a gdb traceback of the SIGSEGV. |
On mavericks there's no gdb, there's lldb. |
I'm having the same issue. Anyone know if there a workaround / fix ? os x 10.9 |
@andytwigg, can you try to run gdb (juliantaylor did some explenation on how further up)? I doubt this is an issue in numpy, but maybe a debugger run can tell us more. Not sure about a workaround, the example seems to be matrix-vector product, numpy probably calls gemv for that, maybe if you make the second a matrix, too, and play around with transposing (playing around with transposing can only work in numpy >=1.8.) |
Can folks having this problem check if it persists with the latest xcode? I think xcode is at 5.0.2. |
Will try it tomorrow and let you know. I'm sorry i can't give you a debugger dump, but it's all llvm on 10.9 and i don't really know how does the debugger here works. Il giorno 16/feb/2014, alle ore 00:01, Charles Harris notifications@github.com ha scritto:
|
I tried with the same setting that was giving me the error (using gensim word2vec model, with the numpy.dot() call on line 435 here https://github.com/piskvorky/gensim/blob/develop/gensim/models/word2vec.py ). Python 2.7.6 (v2.7.6:3a1db0d2747e, Nov 10 2013, 00:42:54)
if you need it i can give you the vectors.bin file to reproduce this, but you can find it on word2vec website (or you can find a short description of how to obtain it from some data). Il giorno 16/feb/2014, alle ore 00:43, w4nderlust w4nderlust@gmail.com ha scritto:
|
Having also the same problem with: Error: Exception Type: EXC_BAD_ACCESS (SIGSEGV) Thread 0 Crashed:: Dispatch queue: com.apple.root.default-priority Binary images: |
I can reproduce on OSX 10.9 with numpy 1.8.1 (the wheel package from pypi) on Python 3.4 from python.org.
Here is the backtrace from the OSX crash tool:
Note that I had to run the reproduction code 3 times before getting the crash (it is not deterministic) and that it does not happen when I replace 200 by 100 or less. |
unless someone can reproduce the issue with a different blas than accelerate this seems to be an apple issue and should be reported to apple (or whoever develops accelerate) |
I think so too. I cannot reproduce the issue with numpy master built with OpenBLAS. Also on numpy built with Accelerate: I fixed the seed with |
Ran into the same problem on Mavericks; compiling numpy with openblas (after getting openblas with homebrew) fixed the problem. |
Should we provide a numpy binary wheel with openblas instead of linking to Accelerate? I believe most OSX users are on Mavericks already: https://www.adium.im/sparkle/#osVersion |
@matthew-brett this was discussed a few times on the mailing list, and there were several saying "please don't do that to us". OpenBLAS will come with its own issues, so I think fixing this issue should be preferred. We managed to do that in scipy, we should manage here I'd think. |
if you have the choice of accelerate or openblas. I'd choose openblas. accelerate also has fork issues (which are fixed in pthread openblas now) and bugs apparently don't get fixed at all nor can they be fixed by the community because its closed source (or is it open?) |
As for the macro magic, I'm only doing that because I have to replace so many functions. Again, all you do is declare a static variable within which you will store a pointer to the original |
And if your code never creates misaligned data, you'll never even call |
Scratch that. Of course, you'll call |
The failmode is
independent of transpositions. |
I am not sure if calling |
You only call loadsym once. You just have to test the pointer every time. On Oct 11, 2014, at 3:52 PM, Sturla Molden <notifications@github.commailto:notifications@github.com> wrote: I am not sure if calling loadsym most of the time is better than calling cblas_sgemm all of the time. — |
Thanks, that makes sence. |
Ok, now I understand your code, except the strange VOIDS(n) macro. |
Apple is stalling, so here is a work-around based on an improved version of my previous fix, slightly inspired by libVecFort. Also we cannot rely on Apple users to actually have Apple's bugfix installed, whenever it might show up. |
I tried pinging Apple devs ("community evangelists") about that ticket a few days ago, but no response. Apparently, Apple devs give priority to issues/tickets based on how many people report the same problem. If that's true, a single bug report doesn't stand much chance... |
I got a reply to my bug report a couple of days ago claiming that this was fixed in Yosemite, and asking me to check, but I haven't got Yosemite. If anyone does have Yosemite, please do check with the Julian's code in #4007 (comment) (compiled with In any case, there are going to be a large number of people with this bug for a a long time, so I think we would still need to work around it. |
Just ran it on Yosemite: "0 16 0". |
thats very good news, I tried my old testcases on yosemite and they indeed do not crash anymore |
@mcg1969 Here is a full patch for all fail modes. You might take a look at it for libVecFort: |
should be worked around in the 1.9 branch now |
It has been a pleasure to see y'all working so well on this. Nicely done everyone. |
Sturla - a particular shout-out to you, thanks very much for your patience and hard work on this one. |
Here is more of the same, and I still have NumPy master (1.10.x) to fix. |
@sturlamolden I'm inclined to wait on Apple to get their act together before applying a fix to numpy master. A PR while the work is fresh in your mind would be good, but it will sit for a while. Don't mean to discourage you, though ;) |
I think I should do this while I have it in my head, then you can sit on it as long as you want :-) A fix for NumPy master would be more like the ones for SciPy though, i.e. putting it in I cannot imagine why Apple would not want to fix a segfault in |
I got a reply from the Apple engineers on my bug report:
|
Bummer. |
A user of gensim @fbkarsdorp reported crash (segfault) with NumPy: piskvorky/gensim#131
The crash seems to have nothing to do with gensim, so I'm transferring the issue here. It happens in
dot
of matrix*vector in single precision, on his OS X Maverick.The text was updated successfully, but these errors were encountered: