ENH: Make 1-dimensional axes not matter for contiguous flags #2694

seberg · 2012-10-22T11:34:10Z

This PR implements that ie. a 1x3x1 array will be both C- and F-Contiguous (or neither). I have mentioned this before and cleaned up the branch a little against current master and I though I am not completly certain I think it should cover the change fully. I am not sure if the tests should be done differently or more extensive though.

Personally I see only one possible problem with it, and that is external libraries relying on strides[-1] == itemsize (for C-contiguous arrays) or strides[0] == itemsize (for F-Contiguous arrays). They could break in very bad ways then. I doubt this should be common or even occur, but there was code in numpy itself that relied on it.

I have run scipy.test() (admittingly old ubuntu) with this as well and no (new) errors occured even when I filled bogus values for the 1-dim strides in array creation. I have checked numpy test for that as well.

I have also removed some strides cleanup code, I don't really see much of a point to make strides look nice when it doesn't do anything else.

Because I am not sure what the special casing in CreateSortedStridePerm was for and I thought it might be to ensure contiguouity of the new array in some cases, I have reverted that change to it to fix Issue #434 with wrong sorting and included it here since the rest of this commit makes that kind of special casing unnecessary (so I am not sure the fix is quite correct outside of this context).

In lowlevel_strided_loops.h, do not rely that the strides for contiguous arrays are set nicely since they can be considered not important if an axis dimensions is zero.

This is unrelated to what is actually tested here.

This was a check for 1-d arrays, this is the generalization to higher dimension if we allow higher dimensioal arrays to be both f- and c-contiguous.

In a few places ISFORTRAN was used to check for f-contiguouity. This is incorrect, since ISFORTRAN never evaluated to True if the array was also c-contiguous.

Both ravel and asarray misbehaved on 1D (or more generally C and F-contiguous arrays if order='F' was specified. Speed test for .tostring seems not feasably.

This changes UpdateFlags to ignore 1-dimensional axis when setting C-/F-contiguous flags. Updates both flags always now.

This changes ctors.c so that new arrays are created in such a way that they are both C- and F-contiguous if possible. Also fixes some corner cases for 0-sized arrays.

This code is unnecessary with changed flags behavior. It would only serve the purpose of making strides look nicer for the user. The UpdateFlags was only required since 1-dim axis being removed might change contiguous flags. But this cannot happen now.

This reverts changes done to CreateSortedStridePerm in commit 9194b3a. The problem is that it would fail for 3x1x3 Fortran order array for example. And special handleing seems unnecessary at least after 1-dim axis not mattering for contiguous flags. This "closes Issue numpy#434"

njsmith · 2012-10-22T15:16:45Z

numpy/core/src/multiarray/flagsobject.c

+        }
+        PyArray_CLEARFLAGS(ap, NPY_ARRAY_C_CONTIGUOUS);
+        PyArray_CLEARFLAGS(ap, NPY_ARRAY_F_CONTIGUOUS);
+        return;


I think we should be able to just delete these special cases for 0- and 1-d arrays, since it's simpler to just handle them as part of the general case?

njsmith · 2012-10-22T15:29:48Z

Looks good aside from the comments above. The trickiest question is definitely how to handle the compatibility for PyArray_CreateSortedStridePerm. Perhaps there's some way to avoid exposing it in 1.7, or to extract the signature change as a small stand-alone change that could be backported...? Or we could just release 1.7 as is, and then leave the superfluous shape argument in place with a note that it is unused.

Remove unnecessary special cases for 0 and 1-d arrays as suggested by njsmith.

njsmith · 2012-10-25T22:16:08Z

Guess this is good then!

ENH: Make 1-dimensional axes not matter for contiguous flags

rgommers · 2012-11-05T20:24:11Z

This PR did cause an ABI break it seems. Current scipy master segfaults if you build it against 1.6.1 and run the tests against numpy master. See http://thread.gmane.org/gmane.comp.python.scientific.devel/17026 for where.

njsmith · 2012-11-05T21:52:33Z

Does building against numpy master fix the problem? From the stack-trace there, it's not at all clear whether this is an ABI skew problem, or just a bug in master.

seberg · 2012-11-05T22:59:04Z

I also think it should not be binary incompatibility but a simple bug. Probably something that I missed to change. But I have a bit difficulty figuring out what. It has to do object arrays with strides=(0,) and shape=(1,) which are considered contiguous after these changes. The medfilter handles those fines for numerical arrays though, but give any object array it crashes. Do you happen to have any idea what is different for object arrays?

seberg · 2012-11-05T23:55:09Z

Checking with valgrind, the error seems to happen also on other arrays (just doesn't segfault). It may be related to the code block below (I don't have scipy set up for compiling though), since I think it relies on point 1. below.

Now since either numpy or scipy still seems to have code blocks relying on strides[ndim-1] == itemsize, the question is if those are very few and easy to fix, or if they are not rather hard to find and this must be supported.
Numpy could force:

strides[ndim-1] == itemsize for a C-Contiguous array.
strides[0] == itemsize for a F-Contiguous array.

or maybe even change it when setting the contiguous flags, but I don't know if we can just change the users strides (even if it does not change the rest of the array). Is there any reasonable use case of relying on clean strides other then those two cases?

--- a/scipy/signal/sigtoolsmodule.c
+++ b/scipy/signal/sigtoolsmodule.c
@@ -772,8 +772,8 @@ static void fill_buffer(char *ip1, PyArrayObject *ap1, PyArr
   int ndims = ap1->nd;
   intp *dims2 = ap2->dimensions;
   intp *dims1 = ap1->dimensions;
-  intp is1 = ap1->strides[ndims-1];
-  intp is2 = ap2->strides[ndims-1];
+  intp is1 = ap1->descr->elsize;
+  intp is2 = ap2->descr->elsize;
   char *ip2 = ap2->data;
   int elsize = ap1->descr->elsize;
   char *ptr;

(About this code block, I didn't figure out the code, but I think it might at some point subtract elemsize and then add strides[ndims-1] causing the pointer to not land at the correct spot since they differ. But again this is just a guess, as I did not compile it).

seberg · 2012-11-07T16:58:31Z

@rgommers, I have (half successful) compiled the new scipy against old numpy. And the diff above does fix the problem and I think it might as well be changed in scipy, even if we change numpy back. Now I wonder how bad this change is, and it may very well be very bad. It shouldn't occur in the typical external C wrapper (as they wouldn't use strides at all), but it forced a fix in one place in numpy and now one place in scipy.

The scipy build I did seemed to not really run all tests and had some other problems. But in any case if someone wants to test if something is still affected by this change this diff may be a crude way to help, since it makes sure the strides are completely bogus if possible when otherwise they are too likely to just happen to be "right". (it creates a numpy failure, but that does not matter)

diff --git a/numpy/core/src/multiarray/ctors.c b/numpy/core/src/multiarray/ctors.c
index ef6c40b..318deb5 100644
--- a/numpy/core/src/multiarray/ctors.c
+++ b/numpy/core/src/multiarray/ctors.c
@@ -3583,7 +3583,12 @@ _array_fill_strides(npy_intp *strides, npy_intp *dims, int nd, size_t itemsize,
     if ((inflag & (NPY_ARRAY_F_CONTIGUOUS|NPY_ARRAY_C_CONTIGUOUS)) ==
                                             NPY_ARRAY_F_CONTIGUOUS) {
         for (i = 0; i < nd; i++) {
-            strides[i] = itemsize;
+            if (dims[i] != 1) {
+                strides[i] = itemsize;
+            }
+            else {
+                strides[i] = 12839184;
+            }
             if (dims[i]) {
                 itemsize *= dims[i];
             }
@@ -3601,7 +3606,12 @@ _array_fill_strides(npy_intp *strides, npy_intp *dims, int nd, size_t itemsize,
     }
     else {
         for (i = nd - 1; i >= 0; i--) {
-            strides[i] = itemsize;
+            if (dims[i] != 1) {
+                strides[i] = itemsize;
+            }
+            else {
+                strides[i] = 12839184;
+            }
             if (dims[i]) {
                 itemsize *= dims[i];
             }
diff --git a/numpy/core/src/multiarray/flagsobject.c b/numpy/core/src/multiarray/flagsobject.c
index ef04bdb..f284292 100644
--- a/numpy/core/src/multiarray/flagsobject.c
+++ b/numpy/core/src/multiarray/flagsobject.c
@@ -110,6 +110,9 @@ _UpdateContiguousFlags(PyArrayObject *ap)
             PyArray_ENABLEFLAGS(ap, NPY_ARRAY_F_CONTIGUOUS);
             return;
         }
+        if (dim == 1) {
+            PyArray_STRIDES(ap)[i] = 12839184; /* complete nuts value */
+        }
         if (dim != 1) {
             if (PyArray_STRIDES(ap)[i] != sd) {
                 is_c_contig = 0;

stefanv · 2012-11-07T21:30:29Z

It's a bit scary to use arbitrary "magic" values like that, isn't it?

seberg · 2012-11-07T21:56:35Z

@stefanv of course its bogus?! But it is just to try if there is anything broken by the fact that arbitrary values could be set, not anything to actually use, its a hack. I just put in random stuff large enough that I thought it should create a segmentation fault in any normal testing. The value should not matter much anyway that purpose as long as its larger then itemsize.

pv · 2012-11-08T09:13:58Z

Using MAX_INT or another recognizable constant would be better practice --- someone will eventually look at the strides and waste time wondering what is going on.

seberg · 2012-11-08T16:14:44Z

Anyway... What I would like to know if I need to undo the behaviour change from this PR or (at least for now) not.

Considering for example the Buffer PEP http://www.python.org/dev/peps/pep-3118/#access-flags which says:

All of these flags [PyBuf_(C|F|ANY)_CONTIGUOUS] imply PyBUF_STRIDES and guarantee that the strides buffer info structure will be filled in correctly.

Is an array F-Contiguous if it can be addressed in a F-Contiguous manner or must the strides also have the property that:

f_contiguous = True
sd = arr.itemsize
for d in xrange(arr.ndim):
    if arr.strides[d] != sd:
        f_contiguous = False
        break
    sd *= arr.shape[d]

So that they are not just filled in, but that they are also filled in a nice clean way (meaning that a 3x1 array is either C- or F-contiguous and never both which this PR changed), instead of being possible to be seen in such way.

Also, if it should not be really necessary to clean up strides (to which I personally lean, but I think it probably violates for example the PEPs intention) is it nevertheless too dangerous because it could break some code in non obvious ways (as it did with the medfilt function, even if the fix is simple), and its really not worth it to potentially break code for the small advantages liberal flag gives.

stefanv · 2012-11-08T17:23:58Z

A 3x1 can be both, if the values are stored contiguously in memory, no?

seberg · 2012-11-08T18:28:31Z

@stefanv well, it can be viewed as both, but maybe it cannot be both at once. Take a 3x1 boolean array, its strides are either 1x1 (clean C-Contiguous) or 1x3 (clean F-Contiguous). This PR made it that it can be 1x(arbitrary) and still considered both (since you could just change to the clean strides). I am unsure if for example that PEP considers that it should be clean, which means only one is possible. In any case if you allow both at once, IMO you basically say that that scipy medfilt code is buggy, even if I could imagine to enforce sensible strides, such that this kind of bug is avoided even if it would be off-label usage. Such that a 3x1 array with 1x1 strides would still be considered F-Contiguous (because at least strides[0] == itemsize, which is the main pitfall), but probably that just creates unnecessary confusion as well.

njsmith · 2012-11-09T08:41:05Z

@stefanv: In reality, a 3x1 can be both. But before this PR, numpy had its own ideas that do not really match up with reality -- in numpy, multidimensional arrays are never both C- and F-contiguous (even arrays that are, in fact, both). That's what this PR is fixing.

The code in scipy assumes that all C-contiguous arrays also have the property that strides[-1] == itemsize. For an Nx1 array this may or may not be true, since the last stride is never used...

seberg · 2012-11-09T16:01:38Z

I have had a quick look at sklearn too. And there would have to be changes in (at least) these cases:

https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/utils/seq_dataset.pyx#L78
https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/tree/_tree.pyx#L429 (and following, also lines 485 and 491-492)
These should be correct, but I do not know if blas might dislike: https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/utils/arrayfuncs.pyx#L89 (if blas dislikes it, it might be a serious point against this)

I will guess I will make PR for sklearn and scipy, since whatever is done here, there is no reason not to change the scipy/sklearn code. But I wanted to note it here to keep an overview of possible problems created by this change.

rgommers · 2012-11-11T11:52:45Z

It's indeed not an ABI issue; with scipy compiled against numpy master this still crashes (for about 1 in 4 runs of scipy.test).

If there's an issue in signaltools then fixing that makes sense of course, but this problem should still be fixed in numpy. Otherwise you get into the situation that the last released versions of numpy and scipy don't work together. That both scipy and sklearn have problems here makes it also likely that the same issue is present in 3rd party code we can't audit.

seberg · 2012-11-12T14:48:48Z

I think its a bit unfortunate, but you are right. So I guess the old behaviour needs to restored? It would be possible to enforce only the strides[-1] == itemsize, which should solve most off-label usage (since calculating sizes from strides would be rather odd, and already now is ill defined for 0-sized arrays), but 8000 it seems a bit like a hack.

I am thinking to make things a bit more strict regarding 0-sized arrays. As its right now possible (not easy) to construct arrays with shape=(3,0) and strides=(arbitrary,0) which are considered C-Contiguous (but would break for example that sklearn code).

If this is reverted, I am wondering if:

we should make a utility function for stride cleanup (and if where to place it). This could be called for indexing with newaxes and for copying with Keeporder (ie. so that np.ones((4,1,4), order='F')[::2,:,:].copy('K') is true F-Contiguous).
It would be useful to create an extra check for np.array(a, order='C', copy=False) to actually create only a view if possible. But thats only worth it if its common to pass 1xN arrays or so to C-Extensions.

rgommers · 2012-11-12T21:33:36Z

Yes. Not sure if all the corner cases should be restored (probably not), but at least scipy and sklearn should work unmodified.

seberg added 11 commits October 21, 2012 17:53

ENH: Do not rely on clean strides for loops.

b46f589

In lowlevel_strided_loops.h, do not rely that the strides for contiguous arrays are set nicely since they can be considered not important if an axis dimensions is zero.

TST: Update ctypes tests for changed contiguous flags setting.

fb64bbf

This is unrelated to what is actually tested here.

API: Change isfortran to f-contig and not c-contig

4b28147

This was a check for 1-d arrays, this is the generalization to higher dimension if we allow higher dimensioal arrays to be both f- and c-contiguous.

BUG: Replace ISFORTRAN by IS_F_CONTIGUOUS.

0e10f55

In a few places ISFORTRAN was used to check for f-contiguouity. This is incorrect, since ISFORTRAN never evaluated to True if the array was also c-contiguous.

TST: Test order='F' combined with C/F-contiguous arrays.

44f83e2

Both ravel and asarray misbehaved on 1D (or more generally C and F-contiguous arrays if order='F' was specified. Speed test for .tostring seems not feasably.

API: Change Flags Updateing to allow C-/F-contiguous arrays

c48156d

This changes UpdateFlags to ignore 1-dimensional axis when setting C-/F-contiguous flags. Updates both flags always now.

API: ctors changed so that contiguous flags ignore 1-dim axis

8daf144

This changes ctors.c so that new arrays are created in such a way that they are both C- and F-contiguous if possible. Also fixes some corner cases for 0-sized arrays.

TST: Check if contiguous flags are correct in various situations

dbe4468

MNT: Minor cleanups in comments and lowlevel_strided_loops.

4d741d2

njsmith reviewed Oct 22, 2012
View reviewed changes

MNT: Remove unnecessary special cases in _UpdateContiguousFlags

02ebf8b

Remove unnecessary special cases for 0 and 1-d arrays as suggested by njsmith.

njsmith mentioned this pull request Oct 23, 2012

BUG: Fix bug with size 1-dims in CreateSortedStridePerm #2696

Merged

njsmith added a commit that referenced this pull request Oct 25, 2012

Merge pull request #2694 from seberg/cflags2

a890a85

ENH: Make 1-dimensional axes not matter for contiguous flags

njsmith merged commit a890a85 into numpy:master Oct 25, 2012

seberg mentioned this pull request Nov 14, 2012

REV: 8daf144 and following. Strides must be clean to be contiguous #2735

Closed

seberg mentioned this pull request Nov 28, 2012

Floating point exception in GradientBoostingClassifier during nosetests on macosx 10.8.2 64bit (and others?) scikit-learn/scikit-learn#1406

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

ENH: Make 1-dimensional axes not matter for contiguous flags #2694

ENH: Make 1-dimensional axes not matter for contiguous flags #2694

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ENH: Make 1-dimensional axes not matter for contiguous flags #2694

ENH: Make 1-dimensional axes not matter for contiguous flags #2694

Uh oh!

Conversation

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!