8000 ENH: Make 1-dimensional axes not matter for contiguous flags by seberg · Pull Request #2694 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

ENH: Make 1-dimensional axes not matter for contiguous flags #2694

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Oct 25, 2012

Conversation

seberg
Copy link
Member
@seberg seberg commented Oct 22, 2012

This PR implements that ie. a 1x3x1 array will be both C- and F-Contiguous (or neither). I have mentioned this before and cleaned up the branch a little against current master and I though I am not completly certain I think it should cover the change fully. I am not sure if the tests should be done differently or more extensive though.

Personally I see only one possible problem with it, and that is external libraries relying on strides[-1] == itemsize (for C-contiguous arrays) or strides[0] == itemsize (for F-Contiguous arrays). They could break in very bad ways then. I doubt this should be common or even occur, but there was code in numpy itself that relied on it.

I have run scipy.test() (admittingly old ubuntu) with this as well and no (new) errors occured even when I filled bogus values for the 1-dim strides in array creation. I have checked numpy test for that as well.

I have also removed some strides cleanup code, I don't really see much of a point to make strides look nice when it doesn't do anything else.

Because I am not sure what the special casing in CreateSortedStridePerm was for and I thought it might be to ensure contiguouity of the new array in some cases, I have reverted that change to it to fix Issue #434 with wrong sorting and included it here since the rest of this commit makes that kind of special casing unnecessary (so I am not sure the fix is quite correct outside of this context).

seberg added 11 commits October 21, 2012 17:53
In lowlevel_strided_loops.h, do not rely that the strides for
contiguous arrays are set nicely since they can be considered not
important if an axis dimensions is zero.
This is unrelated to what is actually tested here.
This was a check for 1-d arrays, this is the generalization to
higher dimension if we allow higher dimensioal arrays to be both
f- and c-contiguous.
In a few places ISFORTRAN was used to check for f-contiguouity.
This is incorrect, since ISFORTRAN never evaluated to True if
the array was also c-contiguous.
Both ravel and asarray misbehaved on 1D (or more generally
C and F-contiguous arrays if order='F' was specified. Speed
test for .tostring seems not feasably.
This changes UpdateFlags to ignore 1-dimensional axis when
setting C-/F-contiguous flags. Updates both flags always now.
This changes ctors.c so that new arrays are created in such a way
that they are both C- and F-contiguous if possible. Also fixes
some corner cases for 0-sized arrays.
This code is unnecessary with changed flags behavior. It would
only serve the purpose of making strides look nicer for the
user.
The UpdateFlags was only required since 1-dim axis being removed
might change contiguous flags. But this cannot happen now.
This reverts changes done to CreateSortedStridePerm in commit
9194b3a. The problem is that it would fail for 3x1x3 Fortran order
array for example. And special handleing seems unnecessary at least
after 1-dim axis not mattering for contiguous flags.
This "closes Issue numpy#434"
}
PyArray_CLEARFLAGS(ap, NPY_ARRAY_C_CONTIGUOUS);
PyArray_CLEARFLAGS(ap, NPY_ARRAY_F_CONTIGUOUS);
return;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should be able to just delete these special cases for 0- and 1-d arrays, since it's simpler to just handle them as part of the general case?

@njsmith
Copy link
Member

njsmith commented Oct 22, 2012

Looks good aside from the comments above. The trickiest question is definitely how to handle the compatibility for PyArray_CreateSortedStridePerm. Perhaps there's some way to avoid exposing it in 1.7, or to extract the signature change as a small stand-alone change that could be backported...? Or we could just release 1.7 as is, and then leave the superfluous shape argument in place with a note that it is unused.

Remove unnecessary special cases for 0 and 1-d arrays as suggested
by njsmith.
@njsmith
Copy link
Member
njsmith commented Oct 25, 2012

Guess this is good then!

njsmith added a commit that referenced this pull request Oct 25, 2012
ENH: Make 1-dimensional axes not matter for contiguous flags
@njsmith njsmith merged commit a890a85 into numpy:master Oct 25, 2012
@rgommers
Copy link
Member
rgommers commented Nov 5, 2012

This PR did cause an ABI break it seems. Current scipy master segfaults if you build it against 1.6.1 and run the tests against numpy master. See http://thread.gmane.org/gmane.comp.python.scientific.devel/17026 for where.

@njsmith
Copy link
Member
njsmith commented Nov 5, 2012

Does building against numpy master fix the problem? From the stack-trace there, it's not at all clear whether this is an ABI skew problem, or just a bug in master.

@seberg
Copy link
Member Author
seberg commented Nov 5, 2012

I also think it should not be binary incompatibility but a simple bug. Probably something that I missed to change. But I have a bit difficulty figuring out what. It has to do object arrays with strides=(0,) and shape=(1,) which are considered contiguous after these changes. The medfilter handles those fines for numerical arrays though, but give any object array it crashes. Do you happen to have any idea what is different for object arrays?

@seberg
Copy link
Member Author
seberg commented Nov 5, 2012

Checking with valgrind, the error seems to happen also on other arrays (just doesn't segfault). It may be related to the code block below (I don't have scipy set up for compiling though), since I think it relies on point 1. below.

Now since either numpy or scipy still seems to have code blocks relying on strides[ndim-1] == itemsize, the question is if those are very few and easy to fix, or if they are not rather hard to find and this must be supported.
Numpy could force:

  1. strides[ndim-1] == itemsize for a C-Contiguous array.
  2. strides[0] == itemsize for a F-Contiguous array.

or maybe even change it when setting the contiguous flags, but I don't know if we can just change the users strides (even if it does not change the rest of the array). Is there any reasonable use case of relying on clean strides other then those two cases?

--- a/scipy/signal/sigtoolsmodule.c
+++ b/scipy/signal/sigtoolsmodule.c
@@ -772,8 +772,8 @@ static void fill_buffer(char *ip1, PyArrayObject *ap1, PyArr
   int ndims = ap1->nd;
   intp *dims2 = ap2->dimensions;
   intp *dims1 = ap1->dimensions;
-  intp is1 = ap1->strides[ndims-1];
-  intp is2 = ap2->strides[ndims-1];
+  intp is1 = ap1->descr->elsize;
+  intp is2 = ap2->descr->elsize;
   char *ip2 = ap2->data;
   int elsize = ap1->descr->elsize;
   char *ptr;

(About this code block, I didn't figure out the code, but I think it might at some point subtract elemsize and then add strides[ndims-1] causing the pointer to not land at the correct spot since they differ. But again this is just a guess, as I did not compile it).

@seberg
Copy link
Member Author
seberg commented Nov 7, 2012

@rgommers, I have (half successful) compiled the new scipy against old numpy. And the diff above does fix the problem and I think it might as well be changed in scipy, even if we change numpy back. Now I wonder how bad this change is, and it may very well be very bad. It shouldn't occur in the typical external C wrapper (as they wouldn't use strides at all), but it forced a fix in one place in numpy and now one place in scipy.

The scipy build I did seemed to not really run all tests and had some other problems. But in any case if someone wants to test if something is still affected by this change this diff may be a crude way to help, since it makes sure the strides are completely bogus if possible when otherwise they are too likely to just happen to be "right". (it creates a numpy failure, but that does not matter)

diff --git a/numpy/core/src/multiarray/ctors.c b/numpy/core/src/multiarray/ctors.c
index ef6c40b..318deb5 100644
--- a/numpy/core/src/multiarray/ctors.c
+++ b/numpy/core/src/multiarray/ctors.c
@@ -3583,7 +3583,12 @@ _array_fill_strides(npy_intp *strides, npy_intp *dims, int nd, size_t itemsize,
     if ((inflag & (NPY_ARRAY_F_CONTIGUOUS|NPY_ARRAY_C_CONTIGUOUS)) ==
                                             NPY_ARRAY_F_CONTIGUOUS) {
         for (i = 0; i < nd; i++) {
-            strides[i] = itemsize;
+            if (dims[i] != 1) {
+                strides[i] = itemsize;
+            }
+            else {
+                strides[i] = 12839184;
+            }
             if (dims[i]) {
                 itemsize *= dims[i];
             }
@@ -3601,7 +3606,12 @@ _array_fill_strides(npy_intp *strides, npy_intp *dims, int nd, size_t itemsize,
     }
     else {
         for (i = nd - 1; i >= 0; i--) {
-            strides[i] = itemsize;
+            if (dims[i] != 1) {
+                strides[i] = itemsize;
+            }
+            else {
+                strides[i] = 12839184;
+            }
             if (dims[i]) {
                 itemsize *= dims[i];
             }
diff --git a/numpy/core/src/multiarray/flagsobject.c b/numpy/core/src/multiarray/flagsobject.c
index ef04bdb..f284292 100644
--- a/numpy/core/src/multiarray/flagsobject.c
+++ b/numpy/core/src/multiarray/flagsobject.c
@@ -110,6 +110,9 @@ _UpdateContiguousFlags(PyArrayObject *ap)
             PyArray_ENABLEFLAGS(ap, NPY_ARRAY_F_CONTIGUOUS);
             return;
         }
+        if (dim == 1) {
+            PyArray_STRIDES(ap)[i] = 12839184; /* complete nuts value */
+        }
         if (dim != 1) {
             if (PyArray_STRIDES(ap)[i] != sd) {
                 is_c_contig = 0;

@stefanv
Copy link
Contributor
stefanv commented Nov 7, 2012

It's a bit scary to use arbitrary "magic" values like that, isn't it?

@seberg
Copy link
Member Author
seberg commented Nov 7, 2012

@stefanv of course its bogus?! But it is just to try if there is anything broken by the fact that arbitrary values could be set, not anything to actually use, its a hack. I just put in random stuff large enough that I thought it should create a segmentation fault in any normal testing. The value should not matter much anyway that purpose as long as its larger then itemsize.

@pv
Copy link
Member
pv commented Nov 8, 2012 8000

Using MAX_INT or another recognizable constant would be better practice --- someone will eventually look at the strides and waste time wondering what is going on.

@seberg
Copy link
Member Author
seberg commented Nov 8, 2012

Anyway... What I would like to know if I need to undo the behaviour change from this PR or (at least for now) not.

Considering for example the Buffer PEP http://www.python.org/dev/peps/pep-3118/#access-flags which says:

All of these flags [PyBuf_(C|F|ANY)_CONTIGUOUS] imply PyBUF_STRIDES and guarantee that the strides buffer info structure will be filled in correctly.

Is an array F-Contiguous if it can be addressed in a F-Contiguous manner or must the strides also have the property that:

f_contiguous = True
sd = arr.itemsize
for d in xrange(arr.ndim):
    if arr.strides[d] != sd:
        f_contiguous = False
        break
    sd *= arr.shape[d]

So that they are not just filled in, but that they are also filled in a nice clean way (meaning that a 3x1 array is either C- or F-contiguous and never both which this PR changed), instead of being possible to be seen in such way.

Also, if it should not be really necessary to clean up strides (to which I personally lean, but I think it probably violates for example the PEPs intention) is it nevertheless too dangerous because it could break some code in non obvious ways (as it did with the medfilt function, even if the fix is simple), and its really not worth it to potentially break code for the small advantages liberal flag gives.

@stefanv
Copy link
Contributor
stefanv commented Nov 8, 2012

A 3x1 can be both, if the values are stored contiguously in memory, no?

@seberg
Copy link
Member Author
seberg commented Nov 8, 2012

@stefanv well, it can be viewed as both, but maybe it cannot be both at once. Take a 3x1 boolean array, its strides are either 1x1 (clean C-Contiguous) or 1x3 (clean F-Contiguous). This PR made it that it can be 1x(arbitrary) and still considered both (since you could just change to the clean strides). I am unsure if for example that PEP considers that it should be clean, which means only one is possible. In any case if you allow both at once, IMO you basically say that that scipy medfilt code is buggy, even if I could imagine to enforce sensible strides, such that this kind of bug is avoided even if it would be off-label usage. Such that a 3x1 array with 1x1 strides would still be considered F-Contiguous (because at least strides[0] == itemsize, which is the main pitfall), but probably that just creates unnecessary confusion as well.

@njsmith
Copy link
Member
njsmith commented Nov 9, 2012

@stefanv: In reality, a 3x1 can be both. But before this PR, numpy had its own ideas that do not really match up with reality -- in numpy, multidimensional arrays are never both C- and F-contiguous (even arrays that are, in fact, both). That's what this PR is fixing.

The code in scipy assumes that all C-contiguous arrays also have the property that strides[-1] == itemsize. For an Nx1 array this may or may not be true, since the last stride is never used...

@seberg
Copy link
Member Author
seberg commented Nov 9, 2012

I have had a quick look at sklearn too. And there would have to be changes in (at least) these cases:

  1. https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/utils/seq_dataset.pyx#L78
  2. https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/tree/_tree.pyx#L429 (and following, also lines 485 and 491-492)
  3. These should be correct, but I do not know if blas might dislike: https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/utils/arrayfuncs.pyx#L89 (if blas dislikes it, it might be a serious point against this)

I will guess I will make PR for sklearn and scipy, since whatever is done here, there is no reason not to change the scipy/sklearn code. But I wanted to note it here to keep an overview of possible problems created by this change.

@rgommers
Copy link
Member

It's indeed not an ABI issue; with scipy compiled against numpy master this still crashes (for about 1 in 4 runs of scipy.test).

If there's an issue in signaltools then fixing that makes sense of course, but this problem should still be fixed in numpy. Otherwise you get into the situation that the last released versions of numpy and scipy don't work together. That both scipy and sklearn have problems here makes it also likely that the same issue is present in 3rd party code we can't audit.

@seberg
Copy link
Member Author
seberg commented Nov 12, 2012

I think its a bit unfortunate, but you are right. So I guess the old behaviour needs to restored? It would be possible to enforce only the strides[-1] == itemsize, which should solve most off-label usage (since calculating sizes from strides would be rather odd, and already now is ill defined for 0-sized arrays), but it seems a bit like a hack.

I am thinking to make things a bit more strict regarding 0-sized arrays. As its right now possible (not easy) to construct arrays with shape=(3,0) and strides=(arbitrary,0) which are considered C-Contiguous (but would break for example that sklearn code).

If this is reverted, I am wondering if:

  1. we should make a utility function for stride cleanup (and if where to place it). This could be called for indexing with newaxes and for copying with Keeporder (ie. so that np.ones((4,1,4), order='F')[::2,:,:].copy('K') is true F-Contiguous).
  2. It would be useful to create an extra check for np.array(a, order='C', copy=False) to actually create only a view if possible. But thats only worth it if its common to pass 1xN arrays or so to C-Extensions.

@rgommers
Copy link
Member

Yes. Not sure if all the corner cases should be restored (probably not), but at least scipy and sklearn should work unmodified.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants
0