-
-
Notifications
You must be signed in to change notification settings - Fork 10.9k
ENH: Make 1-dimensional axes not matter for contiguous flags #2694
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
In lowlevel_strided_loops.h, do not rely that the strides for contiguous arrays are set nicely since they can be considered not important if an axis dimensions is zero.
This is unrelated to what is actually tested here.
This was a check for 1-d arrays, this is the generalization to higher dimension if we allow higher dimensioal arrays to be both f- and c-contiguous.
In a few places ISFORTRAN was used to check for f-contiguouity. This is incorrect, since ISFORTRAN never evaluated to True if the array was also c-contiguous.
Both ravel and asarray misbehaved on 1D (or more generally C and F-contiguous arrays if order='F' was specified. Speed test for .tostring seems not feasably.
This changes UpdateFlags to ignore 1-dimensional axis when setting C-/F-contiguous flags. Updates both flags always now.
This changes ctors.c so that new arrays are created in such a way that they are both C- and F-contiguous if possible. Also fixes some corner cases for 0-sized arrays.
This code is unnecessary with changed flags behavior. It would only serve the purpose of making strides look nicer for the user. The UpdateFlags was only required since 1-dim axis being removed might change contiguous flags. But this cannot happen now.
} | ||
PyArray_CLEARFLAGS(ap, NPY_ARRAY_C_CONTIGUOUS); | ||
PyArray_CLEARFLAGS(ap, NPY_ARRAY_F_CONTIGUOUS); | ||
return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should be able to just delete these special cases for 0- and 1-d arrays, since it's simpler to just handle them as part of the general case?
Looks good aside from the comments above. The trickiest question is definitely how to handle the compatibility for |
Remove unnecessary special cases for 0 and 1-d arrays as suggested by njsmith.
Guess this is good then! |
ENH: Make 1-dimensional axes not matter for contiguous flags
This PR did cause an ABI break it seems. Current scipy master segfaults if you build it against 1.6.1 and run the tests against numpy master. See http://thread.gmane.org/gmane.comp.python.scientific.devel/17026 for where. |
Does building against numpy master fix the problem? From the stack-trace there, it's not at all clear whether this is an ABI skew problem, or just a bug in master. |
I also think it should not be binary incompatibility but a simple bug. Probably something that I missed to change. But I have a bit difficulty figuring out what. It has to do object arrays with |
Checking with valgrind, the error seems to happen also on other arrays (just doesn't segfault). It may be related to the code block below (I don't have scipy set up for compiling though), since I think it relies on point 1. below. Now since either numpy or scipy still seems to have code blocks relying on
or maybe even change it when setting the contiguous flags, but I don't know if we can just change the users strides (even if it does not change the rest of the array). Is there any reasonable use case of relying on clean strides other then those two cases? --- a/scipy/signal/sigtoolsmodule.c
+++ b/scipy/signal/sigtoolsmodule.c
@@ -772,8 +772,8 @@ static void fill_buffer(char *ip1, PyArrayObject *ap1, PyArr
int ndims = ap1->nd;
intp *dims2 = ap2->dimensions;
intp *dims1 = ap1->dimensions;
- intp is1 = ap1->strides[ndims-1];
- intp is2 = ap2->strides[ndims-1];
+ intp is1 = ap1->descr->elsize;
+ intp is2 = ap2->descr->elsize;
char *ip2 = ap2->data;
int elsize = ap1->descr->elsize;
char *ptr; (About this code block, I didn't figure out the code, but I think it might at some point subtract elemsize and then add strides[ndims-1] causing the pointer to not land at the correct spot since they differ. But again this is just a guess, as I did not compile it). |
@rgommers, I have (half successful) compiled the new scipy against old numpy. And the diff above does fix the problem and I think it might as well be changed in scipy, even if we change numpy back. Now I wonder how bad this change is, and it may very well be very bad. It shouldn't occur in the typical external C wrapper (as they wouldn't use strides at all), but it forced a fix in one place in numpy and now one place in scipy. The scipy build I did seemed to not really run all tests and had some other problems. But in any case if someone wants to test if something is still affected by this change this diff may be a crude way to help, since it makes sure the strides are completely bogus if possible when otherwise they are too likely to just happen to be "right". (it creates a numpy failure, but that does not matter) diff --git a/numpy/core/src/multiarray/ctors.c b/numpy/core/src/multiarray/ctors.c
index ef6c40b..318deb5 100644
--- a/numpy/core/src/multiarray/ctors.c
+++ b/numpy/core/src/multiarray/ctors.c
@@ -3583,7 +3583,12 @@ _array_fill_strides(npy_intp *strides, npy_intp *dims, int nd, size_t itemsize,
if ((inflag & (NPY_ARRAY_F_CONTIGUOUS|NPY_ARRAY_C_CONTIGUOUS)) ==
NPY_ARRAY_F_CONTIGUOUS) {
for (i = 0; i < nd; i++) {
- strides[i] = itemsize;
+ if (dims[i] != 1) {
+ strides[i] = itemsize;
+ }
+ else {
+ strides[i] = 12839184;
+ }
if (dims[i]) {
itemsize *= dims[i];
}
@@ -3601,7 +3606,12 @@ _array_fill_strides(npy_intp *strides, npy_intp *dims, int nd, size_t itemsize,
}
else {
for (i = nd - 1; i >= 0; i--) {
- strides[i] = itemsize;
+ if (dims[i] != 1) {
+ strides[i] = itemsize;
+ }
+ else {
+ strides[i] = 12839184;
+ }
if (dims[i]) {
itemsize *= dims[i];
}
diff --git a/numpy/core/src/multiarray/flagsobject.c b/numpy/core/src/multiarray/flagsobject.c
index ef04bdb..f284292 100644
--- a/numpy/core/src/multiarray/flagsobject.c
+++ b/numpy/core/src/multiarray/flagsobject.c
@@ -110,6 +110,9 @@ _UpdateContiguousFlags(PyArrayObject *ap)
PyArray_ENABLEFLAGS(ap, NPY_ARRAY_F_CONTIGUOUS);
return;
}
+ if (dim == 1) {
+ PyArray_STRIDES(ap)[i] = 12839184; /* complete nuts value */
+ }
if (dim != 1) {
if (PyArray_STRIDES(ap)[i] != sd) {
is_c_contig = 0;
|
It's a bit scary to use arbitrary "magic" values like that, isn't it? |
@stefanv of course its bogus?! But it is just to try if there is anything broken by the fact that arbitrary values could be set, not anything to actually use, its a hack. I just put in random stuff large enough that I thought it should create a segmentation fault in any normal testing. The value should not matter much anyway that purpose as long as its larger then itemsize. |
Using MAX_INT or another recognizable constant would be better practice --- someone will eventually look at the strides and waste time wondering what is going on. |
Anyway... What I would like to know if I need to undo the behaviour change from this PR or (at least for now) not. Considering for example the Buffer PEP http://www.python.org/dev/peps/pep-3118/#access-flags which says:
Is an array F-Contiguous if it can be addressed in a F-Contiguous manner or must the strides also have the property that: f_contiguous = True
sd = arr.itemsize
for d in xrange(arr.ndim):
if arr.strides[d] != sd:
f_contiguous = False
break
sd *= arr.shape[d] So that they are not just filled in, but that they are also filled in a nice clean way (meaning that a 3x1 array is either C- or F-contiguous and never both which this PR changed), instead of being possible to be seen in such way. Also, if it should not be really necessary to clean up strides (to which I personally lean, but I think it probably violates for example the PEPs intention) is it nevertheless too dangerous because it could break some code in non obvious ways (as it did with the medfilt function, even if the fix is simple), and its really not worth it to potentially break code for the small advantages liberal flag gives. |
A |
@stefanv well, it can be viewed as both, but maybe it cannot be both at once. Take a |
@stefanv: In reality, a 3x1 can be both. But before this PR, numpy had its own ideas that do not really match up with reality -- in numpy, multidimensional arrays are never both C- and F-contiguous (even arrays that are, in fact, both). That's what this PR is fixing. The code in scipy assumes that all C-contiguous arrays also have the property that strides[-1] == itemsize. For an |
I have had a quick look at sklearn too. And there would have to be changes in (at least) these cases:
I will guess I will make PR for sklearn and scipy, since whatever is done here, there is no reason not to change the scipy/sklearn code. But I wanted to note it here to keep an overview of possible problems created by this change. |
It's indeed not an ABI issue; with scipy compiled against numpy master this still crashes (for about 1 in 4 runs of scipy.test). If there's an issue in signaltools then fixing that makes sense of course, but this problem should still be fixed in numpy. Otherwise you get into the situation that the last released versions of numpy and scipy don't work together. That both scipy and sklearn have problems here makes it also likely that the same issue is present in 3rd party code we can't audit. |
I think its a bit unfortunate, but you are right. So I guess the old behaviour needs to restored? It would be possible to enforce only the I am thinking to make things a bit more strict regarding 0-sized arrays. As its right now possible (not easy) to construct arrays with If this is reverted, I am wondering if:
|
Yes. Not sure if all the corner cases should be restored (probably not), but at least scipy and sklearn should work unmodified. |
This PR implements that ie. a 1x3x1 array will be both C- and F-Contiguous (or neither). I have mentioned this before and cleaned up the branch a little against current master and I though I am not completly certain I think it should cover the change fully. I am not sure if the tests should be done differently or more extensive though.
Personally I see only one possible problem with it, and that is external libraries relying on
strides[-1] == itemsize
(for C-contiguous arrays) orstrides[0] == itemsize
(for F-Contiguous arrays). They could break in very bad ways then. I doubt this should be common or even occur, but there was code in numpy itself that relied on it.I have run
scipy.test()
(admittingly old ubuntu) with this as well and no (new) errors occured even when I filled bogus values for the 1-dim strides in array creation. I have checked numpy test for that as well.I have also removed some strides cleanup code, I don't really see much of a point to make strides look nice when it doesn't do anything else.
Because I am not sure what the special casing in
CreateSortedStridePerm
was for and I thought it might be to ensure contiguouity of the new array in some cases, I have reverted that change to it to fix Issue #434 with wrong sorting and included it here since the rest of this commit makes that kind of special casing unnecessary (so I am not sure the fix is quite correct outside of this context).