FIX: Clean strides of arrays requested to be contiguous #2773

seberg · 2012-11-30T13:52:24Z

Sorry if I spam with this PR, but this is the alternative approach for #2735 that I also mentioned there, and would in a similar form probably also be necessary if it is opted to do a deprecation process to change flags behaviour.

As far as I can see, this should fix all problems found yet, by cleaning up strides when an array (or buffer) is explicitly requested to be contiguous. Of course potentially someone does only check the flags and then relies on clean strides, but at least it is not the case for scipy and sklearn.
It means that some more checks have to be run in these cases, which unfortunately makes np.array(arr, copy=True, order='C') a bit slower if arr is already clean contiguous, but I doubt that is a big issue.

This still lacks tests for the buffer interface, since it seems necessary to write them in C.

rgommers · 2012-11-30T19:23:34Z

Not spam. Thanks for tackling this hairy issue @seberg.

seberg · 2012-12-01T22:13:39Z

So I am wondering a bit about this. I don't think it matters much really, but maybe someone has an opinion or knows why it is as it is. I am tempted to change things such that clean strides for 0-sized arrays actually end in 0. That may seem a little bit odd, but it requires no special case from:

stride = arr.itemsize
# Fortran order here:
for n in range(arr.ndim):
    arr.strides[n] = stride
    stride *= arr.shape[n]

which of course means: arr.shape = (3,0,3) has strides = (1, 3, 0) (fortran order). Which means that the fastest changing dimension is not the first if you count the 0, but that seems no real problem to me. However maybe it confuses ordering when concatenating arrays.

This ensures that when FromArray is asked to ensure a contiguous array, a view is returned when the strides do not match, even if the intput array was already contiguous.

When a contiguous array is explicitely requested, make sure that strides are cleaned up if necessary. This unfortunatly removes some shortcuts though.

After a shape[i] == 0, the strides will be 0 (while before it was treated like shape[i] == 1). This removes the special case, on the down side, the stride order is not as clear due to that. The same was true for the amount of memory is allocated for 0-sized arrays.

seberg · 2012-12-01T23:06:11Z

Thanks to charris for pushing that other fix so fast (and thus making that change not segfault the tests), I just added the change of how to handle strides in 0-sized arrays for the moment.

erg · 2012-12-06T00:51:34Z

Might as well merge this so my scikit-learn stops crashing? ;)

charris · 2012-12-06T01:14:00Z

numpy/core/src/multiarray/buffer.c

+        }
+        if ((order & PyBUF_C_CONTIGUOUS) == PyBUF_C_CONTIGUOUS) {
+            for (k = PyArray_NDIM(arr) - 1; k >= 0; --k) {
+                info->shape[k] = PyArray_DIMS(arr)[k];


Could use PyArray_DIM(arr, k) here. Although I think the current code is actually more informative than the macro. Hmm...

seberg · 2012-12-06T09:15:48Z

Thanks for review charris.

@erg could you double check that this actually fixes sk-learn? Also do you have an opinion if this is really or a good approach considering that there still may be broken code out there after this. (also you could -- and in my opinion should -- change it in sklearn since dividing by itemsize is more explicit anyway :p)

What about that change with 0-strides for 0-sized arrays? I could easily revert that (and write that fix for memory allocation differently). There may have been a good reason why this was always special cased, but it seems to be done like that since the dawn of time.

Tests for the buffer interface are still missing and I am honestly not sure where to put and how to write them or is there a way to request C/F-contiguous buffers from python?

seberg · 2012-12-06T13:17:45Z

I hope the new doc strings are more clear, I always find it difficult to know if things are since I already know what it is about...

erg · 2012-12-06T16:15:59Z

@seberg This doesn't actually fix it. In a hurry atm...

seberg · 2012-12-06T16:56:03Z

@erg you are right :(, I should have checked more careful. Cython does use the buffer interface, but if you write arr.strides Cython does access the original object, which like this does not get cleaned up and if mode='c' is given, it throws an error. Sorry for that, I really thought this should do the trick.

seberg · 2012-12-06T18:35:43Z

But this is bloody annoying, because Cython only checks contiguity by trying to create a contiguous buffer, unless you want to change in-place the original arrays stride, there is no way of performing clean up upon request. This however also means that it is impossible to deprecate current cython generated C-code, because we cannot mess with the buffer interface. So just to note, unless someone has some idea how to continue this, I think I just gave up (great pick of day to do so ;)) for the time being (I will report to the cython guys and ask them if they don't want to point .strides to the buffer strides if thats possible). This is because I really don't know how to continue without breaking sk-learn style buggy implementations unless you wait long enough for a new cython and such.

Oh well, at least this stuff found a few small bugs...

charris · 2012-12-06T19:14:00Z

Hi Sebastian,

I think this was a worthwhile effort and that it would be nice if you could write up a summary of the experience, i.e., where the problems are, what end users shouldn't do, and what you were aiming for. I'm a bit late to this discussion and not up to speed on the details. One thing I started wondering about is if there is a useful distinction to be made between contiguous memory and F/C contiguous memory. The latter implies the former, but not vice versa, but perhaps some applications just need contiguous memory.

seberg · 2012-12-07T18:13:26Z

So I actually wrote a bit, not sure how useful it is though but if anyone actually wants to read such thing: https://gist.github.com/4234989 but maybe these flags are not that important in any case...

I plan on adding a bit info to the numpy documentation when it is clear what exactly the final state is (and warnings not to do assumptions about strides). Also might look into adding some strides cleaning, for example for slicing.

I have no idea if there is any use for knowing if the memory is in a single block or not...

seberg · 2013-01-06T14:24:00Z

Closing this, since it is probably not the right approach and it can be reopened in any case...

seberg added 5 commits December 1, 2012 23:57

BUG: Ensure clean strides for the buffer interface.

048e448

FIX: Ensure clean strides for FromArray if result is contiguous

0b55276

This ensures that when FromArray is asked to ensure a contiguous array, a view is returned when the strides do not match, even if the intput array was already contiguous.

FIX: Make python array creation functions return clean strides

a3047c6

When a contiguous array is explicitely requested, make sure that strides are cleaned up if necessary. This unfortunatly removes some shortcuts though.

TST: Check that new requested contiguous arrays have clean strides.

a195983

charris reviewed Dec 6, 2012
View reviewed changes

MAINT: Style and documentation fixes.

d686bc1

seberg closed this Jan 6, 2013

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

FIX: Clean strides of arrays requested to be contiguous #2773

FIX: Clean strides of arrays requested to be contiguous #2773

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

FIX: Clean strides of arrays requested to be contiguous #2773

FIX: Clean strides of arrays requested to be contiguous #2773

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!