MAINT: Make the refactor suggested in prepare_index #8278

eric-wieser · 2016-11-15T12:56:39Z

Split off from #8276

Extracting the tuple-conversion logic into its own function.

Things I'd like feedback on:

This might worsen performance slightly for scalar indices, as it now forces the tuple conversion. Is there an easy way to tell if this is significant? Is the clarity vs efficiency tradeoff acceptable here?
Have I messed up my reference counting?
Are these functions in the right places?

Update: new approach that:

Avoids calling __getitem__ more than once on the provided sequence, if it is converted on a tuple:
- More performant than master for a[[0, None]]
Avoids allocating any new python objects, instead using a tuple-like buffer on the stack

eric-wieser · 2016-11-15T13:15:28Z

If this is merged, #4434 will need a rebase

eric-wieser · 2016-11-15T13:22:22Z

numpy/core/src/multiarray/mapping.c

-     * The index might be a multi-dimensional index, but not yet a tuple
-     * this makes it a tuple in that case.
-     *
-     * TODO: Refactor into its own function.


As instructed here

charris · 2016-12-03T20:41:32Z

@seberg Comments?

seberg · 2016-12-04T11:02:47Z

numpy/core/src/multiarray/mapping.c

-            if (index == NULL) {
-                return -1;
+            index_as_tuple = PySequence_Tuple(index);
+            if(index_as_tuple == NULL) {


Space before (

(also a few more times)

seberg · 2016-12-04T11:03:29Z

numpy/core/src/multiarray/mapping.c

+    int ellipsis_pos = -1;
+
+    index = prepare_index_tuple(index);
+    if(index == NULL)


space, plus please add the curly brackets, we always put them in numpy

seberg · 2016-12-04T11:05:03Z

numpy/core/src/multiarray/mapping.c

    else {
-        n = PyTuple_GET_SIZE(index);
+        n = PyTuple_GET_SIZE(index_as_tuple);
        if (n > NPY_MAXDIMS * 2) {


I am a bit curious, this block can probably be removed? Seems to be just an early error out, and I don't see a reason to optimize error speed ;).

It's not immediately clear to me where the late error-out is that this is protecting, but I guess I could remove it and see what breaks?

Well if you pass in a finished tuple you have to get the same error at some point.

This line is the part that handles finished tuples. The only way you can avoid this check is by passing a tuple subclass, which is probably not tested

seberg · 2016-12-04T11:07:25Z

Could be good to see whether it makes a real speed difference to build that tuple (likely not, but then I may actually have timed it at the time and thought it might be nice to have). Then again, I am not sure whether arr[0] being even 10-20% slower is a huge deal.

seberg · 2016-12-04T11:08:42Z

Ref counting looks good on first sight, and yes, definitely all in the right place, the only issue may be to check whether doing a bit weirder code for speed may be worth it.

eric-wieser · 2016-12-04T17:54:48Z

Ok, all the style things are fixed

eric-wieser · 2016-12-04T17:59:26Z

Could be good to see whether it makes a real speed difference to build that tuple (likely not, but then I may actually have timed it at the time and thought it might be nice to have). Then again, I am not sure whether arr[0] being even 10-20% slower is a huge deal.

I think this is basically testable right now (I cannot build my numpy locally from master or my branch) as

>>> x = np.arange(1000)
>>> i = 100
>>> %timeit x[i]
The slowest run took 26.21 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 214 ns per loop

>>> %timeit x[(i,)]
The slowest run took 20.53 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 273 ns per loop

So as high as 30%, assuming this test is valid.

seberg · 2016-12-04T18:50:33Z

Hehe, honestly not sure its worth troubling over. If you replace index_as_tuple = Py_BuildValue("(O)", index); with lower level calls, you probably get back already a lot of the difference.

eric-wieser · 2016-12-04T19:36:29Z

PyTuple_Pack(1, index) would probably be faster then, and not really any less clear?

seberg · 2016-12-04T19:39:46Z

Probably, I am not sure, but somewhat thought those pack functions are not the quickest when it comes to micro optimization. Might also be that it does not make a difference at all....

EDIT: Those buildvalue functions

seberg · 2016-12-05T13:15:54Z

Oh ok, nvm. then.

eric-wieser · 2016-12-05T14:12:31Z

There's a weird uncommented check on line 190 that seems to decide that lists of length MAX_DIM are not promoted to tuples, despite the fact that doing so would be valid up to 2*MAX_DIM. Is this a backwards compatibility thing? Is this documented somewhere, so that I can add a comment acordingly?

seberg · 2016-12-05T15:10:23Z

Well the check is only there for the tuple conversion trick, which I am not about to modify, since I think its a crappy hack in any case. Most the time, it is MAX_DIM anyway, but since you can have things like None which can be there more often, I put 2*MAX_DIM as basically "for sure enough".

eric-wieser · 2016-12-21T12:02:01Z

Any more thoughts on this?

seberg · 2017-01-05T19:14:07Z

Ok, just trying to pass through a few things. Not really, do you know if the slowdown got better now, or do you think we should just not worry about it too much?

eric-wieser · 2017-01-05T19:23:43Z

I wasn't really able to profile this patch, being unable to build locally - the closest I could get were running on the released version, comparing:

i = 2; %timeit x[i]
ti = (2,); %timeit x[ti] - profiling of the extra work to iterate the tuple in get_item, if there actually is any
i = 2; %timeit x[i,] - including the worst case tuple constructor cost. A C function call should be less expensive than an opcode though.

These all seemed pretty similar, and the timing overhead seems to make the results pretty hard to compare. So I'd say that we shouldn't worry about it. %timeit x[i + i] seems to give worse results than either of the above. and that's going to be very common in code.

seberg · 2017-01-05T19:46:41Z

Ohh, you have problems building locally? Doesn't the python runtests.py and python runtests.py --ipython work for you?

OK, just ran it on my computer (python3, on python2 things might be different, because it goes through the C getitem function which directly gets the integer, would have to look up the path myself...).

The new times first:

In [1]: x = np.arange(1000)
In [2]: i = 100
In [3]: %timeit x[i]
10000000 loops, best of 3: 149 ns per loop / 116 ns
In [4]: x = np.arange(1000, dtype=object)
In [5]: %timeit x[i]
10000000 loops, best of 3: 119 ns per loop / 78.2 ns

So, hmmmmmm ;). Can't make up my mind, you are right that the difference is lower then doing something like i+i.

eric-wieser · 2017-01-05T20:52:04Z

Wait, why is dtype=object faster in both cases?

seberg · 2017-01-05T20:52:37Z

Simple, it only has to do an incref and not create the scalar python object from whats inside the array.

seberg · 2017-01-11T15:22:39Z

I suppose this: I think we can get away with it, but would prefer to check whether it is not too ugly if we try to avoid it.

eric-wieser · 2017-04-07T17:06:30Z

Ok, timing is now much better, and actually an improvement when passing tuples

Before this PR:

In [1]: a = np.array([1, 2, 3, 4])

In [2]: %timeit a[0]
The slowest run took 29.59 times longer than the fastest. This could mean that an intermediate result is being cached.
10000000 loops, best of 3: 158 ns per loop

In [3]: %timeit a[0,]
The slowest run took 28.01 times longer than the fastest. This could mean that an intermediate result is being cached.
10000000 loops, best of 3: 167 ns per loop

After the latest commit:

In [1]: a = np.array([1, 2, 3, 4])

In [2]: %timeit a[0]
The slowest run took 32.66 times longer than the fastest. This could mean that an intermediate result is being cached.
10000000 loops, best of 3: 157 ns per loop

In [2]: %timeit a[(0,)]
The slowest run took 26.58 times longer than the fastest. This could mean that an intermediate result is being cached.
10000000 loops, best of 3: 158 ns per loop

eric-wieser · 2017-04-07T17:08:29Z

numpy/core/src/multiarray/mapping.c

-            }
-        }
-    }
+    PyObject **raw_indices[NPY_MAXDIMS*2];


We could also allocate these to the length we need, but there doesn't seem to be much precedent for doing that

eric-wieser · 2017-04-08T22:25:39Z

However, the reference counting seems odd to me, where are the incref's/decref's

GET_ITEM returns a borrow reference, so no reference counting is needed. We just unpack the tuple, and either return a borrowed reference to the tuple itself, or return a borrowed reference to each of its items.

I don't think the reference handling code is any different to what was here originally - in both cases, we borrow everything and increment nothing.

The previous iteration allocated a tuple, so had to decref it when it was done - but here, we don't bother allocating a new sequence object, and just cache the result of GetItem

seberg

No, GET_ITEM the (tuple) macro does return a borrowed reference, the sequence function does not (and cannot) do that. Before a new tuple was created which would do the reference counting for us (and increment the reference counts). You have to hold on to the reference, since a custom sequence could return new references (say you got a (x)range(10**3, 10**3+6), then the numbers returned will only have a single reference (at least for all you know).

So no, you will need to do reference counting on your "manual tuple", and I suppose we should/could add a test for it.

seberg · 2017-04-08T22:51:33Z

numpy/core/src/multiarray/mapping.c

+    if (commit_to_unpack) {
+        return n;
+    }
+


Just what I think right now (don't worry about it), but I would remove the blank line at least to make the else stick to the if block.

Done, since this was as much accidental as anything else

eric-wieser · 2017-04-09T10:18:05Z

Damn, you're very right, and that makes this "manual tuple" a much less reasonable thing to work with

eric-wieser · 2017-04-09T11:00:38Z

Ok, added reference counting. This is still more performant than master, and definitely has the edge over the same code using normal PyTuple calls

[ci skip]

eric-wieser · 2017-04-09T11:14:19Z

numpy/core/src/multiarray/mapping.c

+    }
+
+    /* Passing a tuple subclass - needs to handle errors */
+    if (PyTuple_Check(index)) {


The code here is subtly different to what we had before.

Calling PySequence_Tuple(index) invokes __iter__, whereas this invokes __getitem__.

So tuple subclasses that implement those methods inconsistently now behave differently. For instance:

class Plus1Tuple(tuple): def __getitem__(self, i): return super().__getitem__(i) + 1 # oops, we forgot `__iter__`, and inherited `tuple.__iter__`, which # does not fall back on __getitem__

gives:

Before: a[PlusOneTuple([1, 2, 3])] → a[1, 2, 3] (!)

After: a[PlusOneTuple([1, 2, 3])] → a[2, 3, 4]

Can't you just remove this whole block and replace it with commit_to_unpack=1?

OK, plus a check for too many indices.

I am fine with the changed behaviour I think, a tuple subclass should really be OK even with just using PyTuple_GET_ITEM to be honest, otherwise it should be considered badly broken.

Perhaps we should just bite the bullet here and call tuple(tuplesubclass), since efficiency isn't important for this weird case

Yea, I suppose we might just as well put whatever was there before, it won't make the code any uglier and speed really does not matter. But I no need to add a test or so (or if, put a comment that it is fine to break it). This is too strange to really worry about.

Ok, changed to call tuple(tuplesubclass), which makes our life a lot easier

mhvk

Looked at this more out of curiosity than anything else, so only some clarification requests.

mhvk · 2017-04-09T19:50:23Z

numpy/core/src/multiarray/mapping.c

+ *                 borrowed reference.
+ * @param  result  An empty buffer of 2*NPY_MAXDIMS PyObject* to write each
+ *                 index component to. The references written are new..
+ *                 This function will in some cases clobber the array beyond


The comments are very helpful, but here maybe be even more explicit and say "beyond the number of items returned".

Will do - I was struggling to phrase this

Ok, I've removed that remark entirely, as it made more sense under the description of the return value:

* @returns The number of items in `result`, or -1 if an error occured. * The entries in `result` at and beyond this index should be * assumed to contain garbage, even if they were initialized * to NULL, so are not safe to Py_XDECREF.

mhvk · 2017-04-09T19:57:15Z

numpy/core/src/multiarray/mapping.c

+    }
+
+    /*
+     * For some reason, anything that's long but not too long is turned into


This comment confuses me: is it totally unclear why this is done at all? I'd guess that here the point is that it is known that unpacking will fail (well, modulo, the factor of 2), but that one should not preclude a very long list of, e.g., integers. If that is not the case, what would fail if one removed this? Should it be deprecated?

is it totally unclear why this is done at all?

It's totally unclear to me why the 2 is missing. As a result, x[[None] * 20] and x[(None,) * 20] mean the same thing, yet x[[None] * 40] and x[(None,) * 40] mean different things (yet neither error).

Of course, someone might be relying on x[[None] * 40] meaning x[[None] * 40],], so it's too late to fix it.

The rationale behind the 2*NPY_MAXDIMS limit elsewhere is that the result is limited to this many dimensions - at best, you can use an int to remove every existing dimension, and then None to add them back again - so the longest valid index is assumed to be (0,)*NPY_MAXDIMS + (None,)*NPY_MAXDIMS.

That's not really true either, as it ought to be legal to add an Ellipsis in there too...

Should it be deprecated?

Arguably everything from this point down should be deprecated, as in #4434

Hmm, a bit of a mess. But I think for someone reading the code later, it may be useful to explicitly mention your last point, i.e., that everything below here should arguably be deprecated (and refer to 4434).

Also done (but higher up than this line)

x[[None] * 40] should error? But yes, there are some awful example such as using lists of lists as index. I would prefer the things like "for some reason" to be replaced with things like "As described previously, for backwards compatibility" in general, it is after all the implementation of the comment just a bit further up (All of this comes down to that Numeric compat thing! Except of course the 2*N which I did because I thought "why not allow a bit longer indices just in case someone is crazy, no harm").

I'll improve that wording

Improve comments [ci skip]

eric-wieser · 2017-04-10T09:52:46Z

@seberg: Let me know if the refcounting now looks good, then I'll squash together the 5 most recent commits

seberg · 2017-07-15T18:02:17Z

numpy/core/src/multiarray/mapping.c

+ * that the longest possible index that is allowed to produce a result is
+ * `(0,)*np.MAXDIMS + (None,)*np.MAXDIMS`. This assumption turns out to be
+ * wrong (we could add one more item, an Ellipsis), but we keep it for
+ * compatibility.


You know, the 2* is pretty arbitrary, so you can increment it by one if you like, I just set it as a "high enough" value and yeah, forgot that in principle you can go one higher and still get a valid index.

Actually, there is no limit to the number of valid indices. You can index with True or False as many times as you like, and the dimensionality will only ever increase by one

(although in practice, indexing with more than 32 causes problems elsewhere)

Sorry, just thought I would start on this again, don't have much time now so might forget again though, if I do and you want to come back to this, please don't hesitate to ping.

@eric-wieser no, maxdims*2+1 is actually maximu, if you do None/True you add one so you end up with at least that many dims ;).

>>> a = np.arange(6).reshape(2, 3) >>> a[True] array([[[0, 1, 2], [3, 4, 5]]]) >>> a[(True,)*32] array([[[0, 1, 2], [3, 4, 5]]]) >>> a[(True,)*33] ValueError: Cannot construct an iterator with more than 32 operands (33 were requested)

OK, comment seems fine to me, could make it "is based ... longest reasonable index" or so.

seberg · 2017-07-15T21:13:38Z

Hehe, right broadcasting of indices.... anyway, yeah, its plenty as is.

seberg

OK, I think this code is pretty well tested nowadays, so I am fine with merging this. Eric, maybe you can have a glance over yourself one more time and then ping me and I will merge?

seberg · 2017-07-16T12:39:06Z

numpy/core/src/multiarray/mapping.c

+ * that the longest possible index that is allowed to produce a result is
+ * `(0,)*np.MAXDIMS + (None,)*np.MAXDIMS`. This assumption turns out to be
+ * wrong (we could add one more item, an Ellipsis), but we keep it for
+ * compatibility.


OK, comment seems fine to me, could make it "is based ... longest reasonable index" or so.

seberg · 2017-07-16T12:39:23Z

numpy/core/src/multiarray/mapping.c

@@ -139,6 +139,187 @@ PyArray_MapIterSwapAxes(PyArrayMapIterObject *mit, PyArrayObject **ret, int getm
    *ret = (PyArrayObject *)new;
 }

+NPY_NO_EXPORT NPY_INLINE void
+multi_DECREF(PyObject **objects, npy_intp n)


First wasn't sure I like this, but it seems harmless :).

seberg · 2017-07-16T12:42:43Z

numpy/core/src/multiarray/mapping.c

+    }
+
+    /* Obvious single-entry cases */
+    if (0


OK, with those #if formatting it without the 0 is ugly I suppose

Should optimize out anyway. Could be if (0 /* to make macros below easier */

Nah, its fine, obvious enough, just tried to style nitpick and it didn't work too well ;)

seberg · 2017-07-16T12:43:27Z

numpy/core/src/multiarray/mapping.c

+#else
+            || PyLong_CheckExact(index)
+#endif
+            || index == Py_None


seberg · 2017-07-16T12:45:38Z

numpy/core/src/multiarray/mapping.c

+    }
+
+    /* Passing a tuple subclass - needs to handle errors */
+    if (PyTuple_Check(index)) {


I am fine with the changed behaviour I think, a tuple subclass should really be OK even with just using PyTuple_GET_ITEM to be honest, otherwise it should be considered badly broken.

seberg · 2017-07-16T12:52:40Z

numpy/core/src/multiarray/mapping.c

+    }
+
+    /*
+     * For some reason, anything that's long but not too long is turned into


x[[None] * 40] should error? But yes, there are some awful example such as using lists of lists as index. I would prefer the things like "for some reason" to be replaced with things like "As described previously, for backwards compatibility" in general, it is after all the implementation of the comment just a bit further up (All of this comes down to that Numeric compat thing! Except of course the 2*N which I did because I thought "why not allow a bit longer indices just in case someone is crazy, no harm").

seberg · 2017-07-16T12:55:22Z

numpy/core/src/multiarray/mapping.c

+                    || PySequence_Check(tmp_obj)
+                    || PySlice_Check(tmp_obj)
+                    || tmp_obj == Py_Ellipsis
+                    || tmp_obj == Py_None) {


Again, I think we usually put brackets, but no big deal

Don't agree - we use brackets to make precedence of || and && obvious, but a quick grep shows it faily uncommon to use them to aid reading precedence of ||, && and ==, >=, ...

Ok, frankly don't care much, its not *a++ or something...

seberg · 2017-07-16T14:24:37Z

numpy/core/src/multiarray/mapping.c

+     * allocation, but doesn't need to be a fast path anyway
+     */
+    if (PyTuple_Check(index)) {
+        PyTupleObject *tup = PySequence_Tuple(index);


I think you miss the Py_DECREF for tup here, could have done a recursive call as well (first thought you did) instead of refactoring it out.

Thought refactoring it out was more transparent, but yes, I could have.

Also apparently requires a cast because PySequence_Tuple probably returns a PyObject.

seberg · 2017-07-17T11:32:52Z

Should I just squash it some time?

eric-wieser · 2017-07-18T15:08:55Z

Yep, I think squashing via github into one commit is the best plan. The git history is just clutter, but that means if people really care they can check this PR in its unmodified messy state

seberg · 2017-07-18T17:56:06Z

Thanks.

eric-wieser mentioned this pull request Nov 15, 2016

ENH: Add index_tricks.as_index_tuple #8276

Closed

eric-wieser commented Nov 15, 2016

View reviewed changes

charris added 03 - Maintenance component: numpy._core labels Nov 15, 2016

seberg reviewed Dec 4, 2016

View reviewed changes

eric-wieser force-pushed the refactor-prepare-index-only branch from fd4a920 to b32815f Compare December 4, 2016 17:54

eric-wieser mentioned this pull request Apr 6, 2017

BUG: Masked object array of ndarrays attaches mask to masked array value #8906

Open

eric-wieser commented Apr 7, 2017

View reviewed changes

seberg reviewed Apr 8, 2017

View reviewed changes

BUG: Fix reference counting

8c4e556

eric-wieser force-pushed the refactor-prepare-index-only branch from 3c41f62 to 8c4e556 Compare April 9, 2017 11:05

STY: Fix comment capitalization and format

5d9315c

[ci skip]

eric-wieser commented Apr 9, 2017

View reviewed changes

fixup! BUG: Fix reference counting

966b2c7

eric-wieser mentioned this pull request Apr 9, 2017

BUG: SystemError in __getitem__ #8915

Closed

mhvk reviewed Apr 9, 2017

View reviewed changes

fixup! MAINT: Overhaul function to try and increase speed

9832f05

Improve comments [ci skip]

eric-wieser force-pushed the refactor-prepare-index-only branch from 6d21f5a to 9832f05 Compare April 9, 2017 20:47

eric-wieser mentioned this pull request Jun 22, 2017

MAINT: Deal with first TODO in prepare_index in mapping.c #9288

Closed

seberg reviewed Jul 15, 2017

View reviewed changes

seberg approved these changes Jul 16, 2017

View reviewed changes

MAINT: Improve comments, simplify handling of tuple subclasses

68bad6a

eric-wieser force-pushed the refactor-prepare-index-only branch from 7ae8b44 to 68bad6a Compare July 16, 2017 14:11

seberg reviewed Jul 16, 2017

View reviewed changes

fixup! MAINT: Improve comments, simplify handling of tuple subclasses

c587963

eric-wieser force-pushed the refactor-prepare-index-only branch from bf7a0c4 to c587963 Compare July 16, 2017 14:29

fixup! MAINT: Improve comments, simplify handling of tuple subclasses

105e0b4

seberg merged commit 08904d7 into numpy:master Jul 18, 2017

eric-wieser mentioned this pull request Jul 21, 2017

BUG: Inlined functions must be defined somewhere. #9446

Merged

eric-wieser mentioned this pull request Sep 26, 2017

DEP: Deprecate non-tuple nd-indices #9686

Merged

Uh oh!

MAINT: Make the refactor suggested in prepare_index #8278

MAINT: Make the refactor suggested in prepare_index #8278

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment