-
-
Notifications
You must be signed in to change notification settings - Fork 10.9k
MAINT: Make the refactor suggested in prepare_index #8278
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MAINT: Make the refactor suggested in prepare_index #8278
Conversation
If this is merged, #4434 will need a rebase |
numpy/core/src/multiarray/mapping.c
Outdated
* The index might be a multi-dimensional index, but not yet a tuple | ||
* this makes it a tuple in that case. | ||
* | ||
* TODO: Refactor into its own function. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As instructed here
@seberg Comments? |
numpy/core/src/multiarray/mapping.c
Outdated
if (index == NULL) { | ||
return -1; | ||
index_as_tuple = PySequence_Tuple(index); | ||
if(index_as_tuple == NULL) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Space before (
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(also a few more times)
numpy/core/src/multiarray/mapping.c
Outdated
int ellipsis_pos = -1; | ||
|
||
index = prepare_index_tuple(index); | ||
if(index == NULL) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
space, plus please add the curly brackets, we always put them in numpy
numpy/core/src/multiarray/mapping.c
Outdated
else { | ||
n = PyTuple_GET_SIZE(index); | ||
n = PyTuple_GET_SIZE(index_as_tuple); | ||
if (n > NPY_MAXDIMS * 2) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am a bit curious, this block can probably be removed? Seems to be just an early error out, and I don't see a reason to optimize error speed ;).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not immediately clear to me where the late error-out is that this is protecting, but I guess I could remove it and see what breaks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well if you pass in a finished tuple you have to get the same error at some point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line is the part that handles finished tuples. The only way you can avoid this check is by passing a tuple subclass, which is probably not tested
Could be good to see whether it makes a real speed difference to build that tuple (likely not, but then I may actually have timed it at the time and thought it might be nice to have). Then again, I am not sure whether |
Ref counting looks good on first sight, and yes, definitely all in the right place, the only issue may be to check whether doing a bit weirder code for speed may be worth it. |
fd4a920
to
b32815f
Compare
Ok, all the style things are fixed |
I think this is basically testable right now (I cannot build my numpy locally from master or my branch) as >>> x = np.arange(1000)
>>> i = 100
>>> %timeit x[i]
The slowest run took 26.21 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 214 ns per loop
>>> %timeit x[(i,)]
The slowest run took 20.53 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 273 ns per loop So as high as 30%, assuming this test is valid. |
Hehe, honestly not sure its worth troubling over. If you replace |
|
Probably, I am not sure, but somewhat thought those pack functions are not the quickest when it comes to micro optimization. Might also be that it does not make a difference at all.... EDIT: Those buildvalue functions |
Oh ok, nvm. then.
|
There's a weird uncommented check on line 190 that seems to decide that lists of length MAX_DIM are not promoted to tuples, despite the fact that doing so would be valid up to |
Well the check is only there for the tuple conversion trick, which I am not about to modify, since I think its a crappy hack in any case. Most the time, it is |
Any more thoughts on this? |
Ok, just trying to pass through a few things. Not really, do you know if the slowdown got better now, or do you think we should just not worry about it too much? |
I wasn't really able to profile this patch, being unable to build locally - the closest I could get were running on the released version, comparing:
These all seemed pretty similar, and the timing overhead seems to make the results pretty hard to compare. So I'd say that we shouldn't worry about it. |
Ohh, you have problems building locally? Doesn't the OK, just ran it on my computer (python3, on python2 things might be different, because it goes through the C getitem function which directly gets the integer, would have to look up the path myself...). The new times first: In [1]: x = np.arange(1000)
In [2]: i = 100
In [3]: %timeit x[i]
10000000 loops, best of 3: 149 ns per loop / 116 ns
In [4]: x = np.arange(1000, dtype=object)
In [5]: %timeit x[i]
10000000 loops, best of 3: 119 ns per loop / 78.2 ns So, hmmmmmm ;). Can't make up my mind, you are right that the difference is lower then doing something like |
Wait, why is |
Simple, it only has to do an incref and not create the scalar python object from whats inside the array. |
I suppose this: I think we can get away with it, but would prefer to check whether it is not too ugly if we try to avoid it. |
Ok, timing is now much better, and actually an improvement when passing tuples Before this PR: In [1]: a = np.array([1, 2, 3, 4])
In [2]: %timeit a[0]
The slowest run took 29.59 times longer than the fastest. This could mean that an intermediate result is being cached.
10000000 loops, best of 3: 158 ns per loop
In [3]: %timeit a[0,]
The slowest run took 28.01 times longer than the fastest. This could mean that an intermediate result is being cached.
10000000 loops, best of 3: 167 ns per loop After the latest commit: In [1]: a = np.array([1, 2, 3, 4])
In [2]: %timeit a[0]
The slowest run took 32.66 times longer than the fastest. This could mean that an intermediate result is being cached.
10000000 loops, best of 3: 157 ns per loop
In [2]: %timeit a[(0,)]
The slowest run took 26.58 times longer than the fastest. This could mean that an intermediate result is being cached.
10000000 loops, best of 3: 158 ns per loop |
numpy/core/src/multiarray/mapping.c
Outdated
} | ||
} | ||
} | ||
PyObject **raw_indices[NPY_MAXDIMS*2]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could also allocate these to the length we need, but there doesn't seem to be much precedent for doing that
I don't think the reference handling code is any different to what was here originally - in both cases, we borrow everything and increment nothing. The previous iteration allocated a tuple, so had to decref it when it was done - but here, we don't bother allocating a new sequence object, and just cache the result of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, GET_ITEM the (tuple) macro does return a borrowed reference, the sequence function does not (and cannot) do that. Before a new tuple was created which would do the reference counting for us (and increment the reference counts). You have to hold on to the reference, since a custom sequence could return new references (say you got a (x)range(10**3, 10**3+6)
, then the numbers returned will only have a single reference (at least for all you know).
So no, you will need to do reference counting on your "manual tuple", and I suppose we should/could add a test for it.
numpy/core/src/multiarray/mapping.c
Outdated
if (commit_to_unpack) { | ||
return n; | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just what I think right now (don't worry about it), but I would remove the blank line at least to make the else stick to the if block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, since this was as much accidental as anything else
Damn, you're very right, and that makes this "manual tuple" a much less reasonable thing to work with |
Ok, added reference counting. This is still more performant than master, and definitely has the edge over the same code using normal |
3c41f62
to
8c4e556
Compare
numpy/core/src/multiarray/mapping.c
Outdated
} | ||
|
||
/* Passing a tuple subclass - needs to handle errors */ | ||
if (PyTuple_Check(index)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code here is subtly different to what we had before.
Calling PySequence_Tuple(index)
invokes __iter__
, whereas this invokes __getitem__
.
So tuple subclasses that implement those methods inconsistently now behave differently. For instance:
class Plus1Tuple(tuple):
def __getitem__(self, i):
return super().__getitem__(i) + 1
# oops, we forgot `__iter__`, and inherited `tuple.__iter__`, which
# does not fall back on __getitem__
gives:
- Before:
a[PlusOneTuple([1, 2, 3])]
→a[1, 2, 3]
(!) - After:
a[PlusOneTuple([1, 2, 3])]
→a[2, 3, 4]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't you just remove this whole block and replace it with commit_to_unpack=1
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, plus a check for too many indices.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am fine with the changed behaviour I think, a tuple subclass should really be OK even with just using PyTuple_GET_ITEM to be honest, otherwise it should be considered badly broken.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we should just bite the bullet here and call tuple(tuplesubclass)
, since efficiency isn't important for this weird case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, I suppose we might just as well put whatever was there before, it won't make the code any uglier and speed really does not matter. But I no need to add a test or so (or if, put a comment that it is fine to break it). This is too strange to really worry about.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, changed to call tuple(tuplesubclass)
, which makes our life a lot easier
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looked at this more out of curiosity than anything else, so only some clarification requests.
numpy/core/src/multiarray/mapping.c
Outdated
* borrowed reference. | ||
* @param result An empty buffer of 2*NPY_MAXDIMS PyObject* to write each | ||
* index component to. The references written are new.. | ||
* This function will in some cases clobber the array beyond |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comments are very helpful, but here maybe be even more explicit and say "beyond the number of items returned".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do - I was struggling to phrase this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I've removed that remark entirely, as it made more sense under the description of the return value:
* @returns The number of items in `result`, or -1 if an error occured.
* The entries in `result` at and beyond this index should be
* assumed to contain garbage, even if they were initialized
* to NULL, so are not safe to Py_XDECREF.
numpy/core/src/multiarray/mapping.c
Outdated
} | ||
|
||
/* | ||
* For some reason, anything that's long but not too long is turned into |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment confuses me: is it totally unclear why this is done at all? I'd guess that here the point is that it is known that unpacking will fail (well, modulo, the factor of 2), but that one should not preclude a very long list of, e.g., integers. If that is not the case, what would fail if one removed this? Should it be deprecated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it totally unclear why this is done at all?
It's totally unclear to me why the 2 is missing. As a result, x[[None] * 20]
and x[(None,) * 20]
mean the same thing, yet x[[None] * 40]
and x[(None,) * 40]
mean different things (yet neither error).
Of course, someone might be relying on x[[None] * 40]
meaning x[[None] * 40],]
, so it's too late to fix it.
The rationale behind the 2*NPY_MAXDIMS
limit elsewhere is that the result is limited to this many dimensions - at best, you can use an int
to remove every existing dimension, and then None
to add them back again - so the longest valid index is assumed to be (0,)*NPY_MAXDIMS + (None,)*NPY_MAXDIMS
.
That's not really true either, as it ought to be legal to add an Ellipsis
in there too...
Should it be deprecated?
Arguably everything from this point down should be deprecated, as in #4434
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, a bit of a mess. But I think for someone reading the code later, it may be useful to explicitly mention your last point, i.e., that everything below here should arguably be deprecated (and refer to 4434).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also done (but higher up than this line)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
x[[None] * 40]
should error? But yes, there are some awful example such as using lists of lists as index. I would prefer the things like "for some reason" to be replaced with things like "As described previously, for backwards compatibility" in general, it is after all the implementation of the comment just a bit further up (All of this comes down to that Numeric compat thing! Except of course the 2*N which I did because I thought "why not allow a bit longer indices just in case someone is crazy, no harm").
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll improve that wording
Improve comments [ci skip]
6d21f5a
to
9832f05
Compare
@seberg: Let me know if the refcounting now looks good, then I'll squash together the 5 most recent commits |
numpy/core/src/multiarray/mapping.c
Outdated
* that the longest possible index that is allowed to produce a result is | ||
* `(0,)*np.MAXDIMS + (None,)*np.MAXDIMS`. This assumption turns out to be | ||
* wrong (we could add one more item, an Ellipsis), but we keep it for | ||
* compatibility. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You know, the 2* is pretty arbitrary, so you can increment it by one if you like, I just set it as a "high enough" value and yeah, forgot that in principle you can go one higher and still get a valid index.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, there is no limit to the number of valid indices. You can index with True
or False
as many times as you like, and the dimensionality will only ever increase by one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(although in practice, indexing with more than 32 causes problems elsewhere)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, just thought I would start on this again, don't have much time now so might forget again though, if I do and you want to come back to this, please don't hesitate to ping.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eric-wieser no, maxdims*2+1 is actually maximu, if you do None/True you add one so you end up with at least that many dims ;).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
>>> a = np.arange(6).reshape(2, 3)
>>> a[True]
array([[[0, 1, 2],
[3, 4, 5]]])
>>> a[(True,)*32]
array([[[0, 1, 2],
[3, 4, 5]]])
>>> a[(True,)*33]
ValueError: Cannot construct an iterator with more than 32 operands (33 were requested)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, comment seems fine to me, could make it "is based ... longest reasonable index" or so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
Hehe, right broadcasting of indices.... anyway, yeah, its plenty as is.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I think this code is pretty well tested nowadays, so I am fine with merging this. Eric, maybe you can have a glance over yourself one more time and then ping me and I will merge?
numpy/core/src/multiarray/mapping.c
Outdated
* that the longest possible index that is allowed to produce a result is | ||
* `(0,)*np.MAXDIMS + (None,)*np.MAXDIMS`. This assumption turns out to be | ||
* wrong (we could add one more item, an Ellipsis), but we keep it for | ||
* compatibility. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, comment seems fine to me, could make it "is based ... longest reasonable index" or so.
@@ -139,6 +139,187 @@ PyArray_MapIterSwapAxes(PyArrayMapIterObject *mit, PyArrayObject **ret, int getm | |||
*ret = (PyArrayObject *)new; | |||
} | |||
|
|||
NPY_NO_EXPORT NPY_INLINE void | |||
multi_DECREF(PyObject **objects, npy_intp n) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First wasn't sure I like this, but it seems harmless :).
numpy/core/src/multiarray/mapping.c
Outdated
} | ||
|
||
/* Obvious single-entry cases */ | ||
if (0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, with those #if
formatting it without the 0 is ugly I suppose
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should optimize out anyway. Could be if (0 /* to make macros below easier */
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nah, its fine, obvious enough, just tried to style nitpick and it didn't work too well ;)
#else | ||
|| PyLong_CheckExact(index) | ||
#endif | ||
|| index == Py_None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
brackets?
numpy/core/src/multiarray/mapping.c
Outdated
} | ||
|
||
/* Passing a tuple subclass - needs to handle errors */ | ||
if (PyTuple_Check(index)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am fine with the changed behaviour I think, a tuple subclass should really be OK even with just using PyTuple_GET_ITEM to be honest, otherwise it should be considered badly broken.
numpy/core/src/multiarray/mapping.c
Outdated
} | ||
|
||
/* | ||
* For some reason, anything that's long but not too long is turned into |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
x[[None] * 40]
should error? But yes, there are some awful example such as using lists of lists as index. I would prefer the things like "for some reason" to be replaced with things like "As described previously, for backwards compatibility" in general, it is after all the implementation of the comment just a bit further up (All of this comes down to that Numeric compat thing! Except of course the 2*N which I did because I thought "why not allow a bit longer indices just in case someone is crazy, no harm").
|| PySequence_Check(tmp_obj) | ||
|| PySlice_Check(tmp_obj) | ||
|| tmp_obj == Py_Ellipsis | ||
|| tmp_obj == Py_None) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, I think we usually put brackets, but no big deal
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't agree - we use brackets to make precedence of ||
and &&
obvious, but a quick grep shows it faily uncommon to use them to aid reading precedence of ||
, &&
and ==
, >=
, ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, frankly don't care much, its not *a++
or something...
7ae8b44
to
68bad6a
Compare
numpy/core/src/multiarray/mapping.c
Outdated
* allocation, but doesn't need to be a fast path anyway | ||
*/ | ||
if (PyTuple_Check(index)) { | ||
PyTupleObject *tup = PySequence_Tuple(index); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you miss the Py_DECREF for tup here, could have done a recursive call as well (first thought you did) instead of refactoring it out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thought refactoring it out was more transparent, but yes, I could have.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also apparently requires a cast because PySequence_Tuple
probably returns a PyObject.
bf7a0c4
to
c587963
Compare
Should I just squash it some time? |
Yep, I think squashing via github into one commit is the best plan. The git history is just clutter, but that means if people really care they can check this PR in its unmodified messy state |
Thanks. |
Split off from #8276
Extracting the tuple-conversion logic into its own function.
Things I'd like feedback on:
This might worsen performance slightly for scalar indices, as it now forces the tuple conversion. Is there an easy way to tell if this is significant? Is the clarity vs efficiency tradeoff acceptable here?Update: new approach that:
__getitem__
more than once on the provided sequence, if it is converted on a tuple:master
fora[[0, None]]