-
-
Notifications
You must be signed in to change notification settings - Fork 10.9k
Multiarray Double Free or Unmap Pointer error for huge datasets #2995
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
That's a pretty old version of numpy, any chance you can test with the latest release? |
I tested it on a machine with NumPy 1.7.0. This time, I just got a segmentation fault without any output to STDOUT or STDERR. I ran it through GDB, and got the following output:
Do you know what the issue is? I will try my best to recompile numpy with debugging symbols so that it can be more informative, but perhaps you may have more insight based on the above than me. |
#2 is where the call to |
@pv IIRC, there is also a memory limitation with sparse matrices related to the Fortran being compiled with 32 bit integers as indexes. |
@charris Yes, if I just comment out I unraveled this mathematical operation to try and see where exactly it breaks. If I replace the I am currently unable to install valgrind on the machine that has numpy 1.7, but if you think it would be useful to see let me know and I'll see what I can do. To be honest, the data passed in through the first argument is not too large (497MB) and I can send you a pointer to it, but unless you have a large memory machine you may not be able to replicate my error when running this code. |
I suspect the problem is with the sparse code, @pv can help you much more there than myself. It might be worth testing the scipy 12 beta that was just released to see if the problem persists. If you search the scipy list for sparse, dense, multiplication, memory, and such related terms I'm pretty sure you will find some some relevant posts. |
Another question, you say matrix multiplication whereas the '*' operator is normally element-wise multiplication. Are you using the numpy matrix class by any chance? |
@charris Yes, I want to do matrix multiplication, not element-wise multiplication. Since W is a sparse matrix, I thought that |
@charris: using sparse matrices with If W is very sparse without special structure, then @asaluja: check the number |
@pv I think you have identified the issue here. While W is very sparse, all the rows are non-zero. In this particular example, |
@asaluja: yes, it is unfortunate, but it is the situation right now that nobody has been interested enough in this to push through the support for 64-bit nnz. There is some experimental work done on this here, though: http://projects.scipy.org/scipy/ticket/1307 https://github.com/pv/scipy-work/commits/ticket/1307 You may try this out, if you are interested. Alternatively, there is also a second sparse matrix library for Python: http://pysparse.sourceforge.net/ --- but I'm not sure if it's better in this respect. This discussion should be moved to -> http://projects.scipy.org/scipy/ticket/1307 --- this looks like a bug in Scipy, not in numpy arrays. |
Ok thanks @pv I had already opened a ticket on Scipy (http://projects.scipy.org/scipy/ticket/1846), maybe you can consolidate with the ticket you had linked to? Thanks! |
Hello NumPy team,
I seem to have encountered some unusual behavior when running NumPy/SciPy on massive datasets. Please see below for the error. Also, this ticket/bug report has been cross-posted on stackoverflow.com (http://stackoverflow.com/questions/14906962/python-double-free-error-for-huge-datasets) and the SciPy developers list (http://projects.scipy.org/scipy/ticket/1846). It was upon the suggestion of an SOer that I decided to post here and on the SciPy dev list.
I am running Linux x86-64-bit OpenSuSE 11.4, NumPy version 1.5.1, SciPy version 0.9.0, Python 2.7.
I have a very simple script in Python, but for some reason I get the following error when running a large amount of data:
I am used to these errors coming up in C or C++, when one tries to free memory that has already been freed. However, by my understanding of Python (and especially the way I've written the code), I really don't understand why this should happen.
Here is the code:
One may ask, why declare a P and a Q numpy array? I simply do that to reflect the actual conditions (as this code is simply a segment of what I actually do, where I need a P matrix and declare it beforehand).
I have access to a 192GB machine, and so I tested this out on a very large SciPy sparse matrix (2.2 million by 2.2 million, but very sparse, that's not the issue). The main memory is taken up by the Q, P, and mat matrices, as they are all 2.2 million by 2000 dense matrices (size = 2.2 million, numlabels = 2000). The peak memory goes up to 131GB, which comfortably fits in memory. While the mat matrix is being computed, I get the glibc error, and my process automatically goes into the sleep (S) state, without deallocating the 131GB it has taken up.
Given the bizarre (for Python) error (I am not explicitly deallocating anything), and the fact that this works nicely for smaller matrix sizes (around 1.5 million by 2000), I am really not sure where to start to debug this.
As a starting point, I have set "ulimit -s unlimited" before running, but to no avail.
Any help or insight into numpy's behavior with really large amounts of data would be welcome.
Note that this is NOT an out of memory error - I have 196GB, and my process reaches around 131GB and stays there for some time before giving the error above.
As per suggestions, I ran Python with GDB. Interestingly, on one GDB run I forgot to set the stack size limit to "unlimited", and got the following output:
When I set the stack size limit to unlimited, I get the following:
This makes me believe the basic issue is with the numpy multiarray core module (line #4 in the first output and line #18 in the second). I will bring it up as a bug report in both numpy and scipy just in case.
Has anyone seen this before?
The text was updated successfully, but these errors were encountered: