8000 FIX: Clean strides of arrays requested to be contiguous by seberg · Pull Request #2773 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

FIX: Clean strides of arrays requested to be contiguous #2773

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 6 commits into from

Conversation

seberg
Copy link
Member
@seberg seberg commented Nov 30, 2012

Sorry if I spam with this PR, but this is the alternative approach for #2735 that I also mentioned there, and would in a similar form probably also be necessary if it is opted to do a deprecation process to change flags behaviour.

As far as I can see, this should fix all problems found yet, by cleaning up strides when an array (or buffer) is explicitly requested to be contiguous. Of course potentially someone does only check the flags and then relies on clean strides, but at least it is not the case for scipy and sklearn.
It means that some more checks have to be run in these cases, which unfortunately makes np.array(arr, copy=True, order='C') a bit slower if arr is already clean contiguous, but I doubt that is a big issue.

This still lacks tests for the buffer interface, since it seems necessary to write them in C.

@rgommers
Copy link
Member

Not spam. Thanks for tackling this hairy issue @seberg.

@seberg
Copy link
Member Author
seberg commented Dec 1, 2012

So I am wondering a bit about this. I don't think it matters much really, but maybe someone has an opinion or knows why it is as it is. I am tempted to change things such that clean strides for 0-sized arrays actually end in 0. That may seem a little bit odd, but it requires no special case from:

stride = arr.itemsize
# Fortran order here:
for n in range(arr.ndim):
    arr.strides[n] = stride
    stride *= arr.shape[n]

which of course means: arr.shape = (3,0,3) has strides = (1, 3, 0) (fortran order). Which means that the fastest changing dimension is not the first if you count the 0, but that seems no real problem to me. However maybe it confuses ordering when concatenating arrays.

This ensures that when FromArray is asked to ensure a contiguous
array, a view is returned when the strides do not match, even if
the intput array was already contiguous.
When a contiguous array is explicitely requested, make sure that
strides are cleaned up if necessary. This unfortunatly removes
some shortcuts though.
After a shape[i] == 0, the strides will be 0 (while before it was
treated like shape[i] == 1). This removes the special case, on the
down side, the stride order is not as clear due to that.
The same was true for the amount of memory is allocated for 0-sized arrays.
@seberg
Copy link
Member Author
seberg commented Dec 1, 2012

Thanks to charris for pushing that other fix so fast (and thus making that change not segfault the tests), I just added the change of how to handle strides in 0-sized arrays for the moment.

@erg
Copy link
erg commented Dec 6, 2012

Might as well merge this so my scikit-learn stops crashing? ;)

}
if ((order & PyBUF_C_CONTIGUOUS) == PyBUF_C_CONTIGUOUS) {
for (k = PyArray_NDIM(arr) - 1; k >= 0; --k) {
info->shape[k] = PyArray_DIMS(arr)[k];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could use PyArray_DIM(arr, k) here. Although I think the current code is actually more informative than the macro. Hmm...

@seberg
Copy link
Member Author
seberg commented Dec 6, 2012

Thanks for review charris.

@erg could you double check that this actually fixes sk-learn? Also do you have an opinion if this is really or a good approach considering that there still may be broken code out there after this. (also you could -- and in my opinion should -- change it in sklearn since dividing by itemsize is more explicit anyway :p)

What about that change with 0-strides for 0-sized arrays? I could easily revert that (and write that fix for memory allocation differently). There may have been a good reason why this was always special cased, but it seems to be done like that since the dawn of time.

Tests for the buffer interface are still missing and I am honestly not sure where to put and how to write them or is there a way to request C/F-contiguous buffers from python?

@seberg
Copy link
Member Author
seberg commented Dec 6, 2012

I hope the new doc strings are more clear, I always find it difficult to know if things are since I already know what it is about...

@erg
Copy link
erg commented Dec 6, 2012

@seberg This doesn't actually fix it. In a hurry atm...

@seberg
Copy link
Member Author
seberg commented Dec 6, 2012

@erg you are right :(, I should have checked more careful. Cython does use the buffer interface, but if you write arr.strides Cython does access the original object, which like this does not get cleaned up and if mode='c' is given, it throws an error. Sorry for that, I really thought this should do the trick.

@seberg
Copy link
Member Author
seberg commented Dec 6, 2012

But this is bloody annoying, because Cython only checks contiguity by trying to create a contiguous buffer, unless you want to change in-place the original arrays stride, there is no way of performing clean up upon request. This however also means that it is impossible to deprecate current cython generated C-code, because we cannot mess with the buffer interface. So just to note, unless someone has some idea how to continue this, I think I just gave up (great pick of day to do so ;)) for the time being (I will report to the cython guys and ask them if they don't want to point .strides to the buffer strides if thats possible). This is because I really don't know how to continue without breaking sk-learn style buggy implementations unless you wait long enough for a new cython and such.

Oh well, at least this stuff found a few small bugs...

@charris
Copy link
Member
charris commented Dec 6, 2012

Hi Sebastian,

I think this was a worthwhile effort and that it would be nice if you could write up a summary of the experience, i.e., where the problems are, what end users shouldn't do, and what you were aiming for. I'm a bit late to this discussion and not up to speed on the details. One thing I started wondering about is if there is a useful distinction to be made between contiguous memory and F/C contiguous memory. The latter implies the former, but not vice versa, but perhaps some applications just need contiguous memory.

@seberg
Copy link
Member Author
seberg commented Dec 7, 2012

So I actually wrote a bit, not sure how useful it is though but if anyone actually wants to read such thing: https://gist.github.com/4234989 but maybe these flags are not that important in any case...

I plan on adding a bit info to the numpy documentation when it is clear what exactly the final state is (and warnings not to do assumptions about strides). Also might look into adding some strides cleaning, for example for slicing.

I have no idea if there is any use for knowing if the memory is in a single block or not...

@seberg
Copy link
Member Author
seberg commented Jan 6, 2013

Closing this, since it is probably not the right approach and it can be reopened in any case...

@seberg seberg closed this Jan 6, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants
0