8000 ENH: Add index_tricks.as_index_tuple by eric-wieser · Pull Request #8276 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

ENH: Add index_tricks.as_index_tuple #8276

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 6 commits into from

Conversation

eric-wieser
Copy link
Member
@eric-wieser eric-wieser commented Nov 14, 2016

Expose the function added in #8278 to help with #8275.
This was motivated by some confusion about the trigger for advanced indexing

Are these functions in the right places?

index_ndim = 1;
/* If the index is not a tuple, conver it into (index,) */
if (!make_tuple && !PyTuple_CheckExact(index)) {
index_as_tuple = Py_BuildValue("(O)", index);
Copy link
Member Author
@eric-wieser eric-wieser Nov 14, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the more inefficient line. If this is unacceptable, then some extra arguments could enable/disable this behaviour. I would like this to stay in some form though, so that it can be exposed python-side

@eric-wieser eric-wieser force-pushed the refactor-prepare-index branch 4 times, most recently from 700a7ac to c575667 Compare November 15, 2016 00:29
@eric-wieser
Copy link
Member Author
eric-wieser commented Nov 15, 2016

Hmm, that last commit introduced the error Fatal Python error: ../Objects/codeobject.c:367 object at 0x7f6c2b868850 has negative ref count '0x2424242424242426'm, but only on travis

Fixed in 41bae74. Knew I wouldn't get refcounting right first time!

@seberg
Copy link
Member
seberg commented Nov 15, 2016

I have on question. What about recreating the tuple from the prepared index? Not because that saves the tuple creation otherwise (I thought i create the tuple in any case ;)). But because, given an input array, you will get a lot of other information out of it such as expanding the Ellipsis. Though I guess conversion to intp arrays may not be what you want.

The question is a bit what the use case is. Just that weird list to tuple hack that we would almost like to get rid off?

@eric-wieser
Copy link
Member Author
eric-wieser commented Nov 15, 2016

But because, given an input array, you will get a lot of other information out of it such as expanding the Ellipsis. Though I guess conversion to intp arrays may not be what you want.

I agree, this seems like a good idea. But as you say, it may be desirable to allow normally-forbidden array types to pass through, so there's the danger of doing more than the user wants, and having them resort to doing the whole thing from scratch.

Perhaps prepare_index should be more lenient in what it accepts, and then the default array_subscript should impose the tighter restrictions

The question is a bit what the use case is.

Right now, it's to replace these (incorrect) lines of code which start any overload of __getitem__:

def __getitem__(self, index):
    if not isinstance(index, tuple):
        index = (index,)

    # do some manipulation of `index`

    return super().__getitem__(index)

we would almost like to get rid of?

Now that this is refactored out, I'd be in favor of dropping a DeprecationWarning in this list-to-tuple hack, since it seems to be left over under the guise of compatibility with numeric


Would you like me to split this PR into the refactor and the python interface, such that the former can be merged and the latter discussed further?

@seberg
Copy link
Member
seberg commented Nov 15, 2016

I already have a very old PR adding that deprecation I think. But it did not do any refactoring (on the downside, I will have to do annoying merging for the oindex stuff but I can live with it, and don't have time to really pursue it anyway).

Note that it will be a long deprecation and you might want to check my PR if you want to start it, because it already includes a lot of the necessary test fixups as well. The problem is that lists as tuples are maybe annoying, but a very common thing when building an index, so they are also very common downstream.

If you are more interested in subclassing and indexing.... The biggest problem with adding oindex, etc. is for me how exactly to handle subclasses. If we can figure that out we could actually push it forward, so if you have energy/ideas there....

@seberg
Copy link
Member
seberg commented Nov 15, 2016

@eric-wieser the thing is.... What kind of manipulation do you do? Would that manipulation not possibly necessacitate more information which would be much easier available with the other information? Similar to the __numpy_getitem__ passing in things like "a copy will be made", as I once thought about for oindex, etc. But there were some other problems. I don't have time to think about all the things right now, just want to note that there may be other approaches, and we should think about it a little before implementing this hastily and noticing later that more would be better/needed if we ever do oindex/vindex.

EDIT: I had (maybe its still in the branch) some kind of indexer object, which had some extra information about the index and could be applied to a base array like

def __numpy_getitem__(self, indexer):
    print(indexer.copy)
    print(indexer.scalar)
    return wrap(indexer(self.data))

@eric-wieser
Copy link
Member Author
eric-wieser commented Nov 15, 2016

Can you link me to some reference on what oindex is?

And are you in favor of me splitting this PR?

@seberg
Copy link
Member
seberg commented Nov 15, 2016

@eric-wieser, gladly ;).

8000

@seberg
Copy link
Member
seberg commented Nov 15, 2016

The indexer thing seems currently in the oindex PR, but I guess all thoughts about it were only in some mails to the list.

@eric-wieser eric-wieser changed the title Make the refactor suggested in prepare_index ENH: Add index_tricks.as_index_tuple Nov 15, 2016
@eric-wieser eric-wieser force-pushed the refactor-prepare-index branch from 69ef1b4 to ef117bd Compare November 15, 2016 13:20
@eric-wieser
Copy link
Member Author

I've gone ahead and split this PR, and squashed all the fixup commits

@homu
Copy link
Contributor
homu commented Nov 17, 2016

☔ The latest upstream changes (presumably #8287) made this pull request unmergeable. Please resolve the merge conflicts.

8000

as_index_tuple(index)

Normalizes an index argument, like that passed to `__getitem__`, into a tuple.
Follows the invariant that `x[index]` is identical to `x[as_index_tuple(index)].
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense, but it doesn't really explain what it does do. There are lots of types of index normalization beyond ensuring you always get a tuple. It would be valuable to describe the precise type of the return value (e.g., can the result include Ellipsis? or lists? or arrays with boolean dtype?).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stephan, you are a subclasser ;). Would having more of the normalization/information available be worth it? I am also a bit thinking again about the new indexing attributes idea in combination with subclasses.
Right now, this does not really change the index much at all, but I mean we could give all information that numpy has after its index preperation step without much hassle. Like that indexer object which is like

indexer = get_indexer(arr, index, maybe_also_allow_single_boolean=False)
arr[index] == indexer(arr)
indexer(arr_with_same_shape)  # works too
indexer.scalar_result
indexer.returns_view
indexer.type == "plain_indexing"

Dunno, you probably saw this suggestion back in the day, but....

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The question is, is there any intermediate result of prepare_index that might in some situations be more useful than the final result? Right now, all I can think of is allowing non-integer np.arrays. Any one able to come up with something else?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess non-int arrays are somewhat thinkable since someone might do something like (texture) interpolation based on floating indices. Other then that, I have my doubts ;). The thing is, you can only go that far and actually maintain numpy compatibility.... It would otherwise quickly become some other kind of indexing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel the only reason you'd want this feature in the first place is to implement some custom form of indexing that diverges from a numpy index. In the past, I've considered:

  • Wraparound indexing of various types
  • interpolation indexing (as you mention)
  • constant offset-indexing (arrays starting at 1! More seriously, useful for modelling discrete time filters starting at -k)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stephan, you are a subclasser ;)

To be clear, I only use duck-array types, no subclasses!

I don't know if this would be directly useful (maybe?) -- it's hard to say without a clear description of exactly what it does! In practice I would guess that most folks who write customized indexing code that works mostly but not exactly like NumPy will roll their own normalization (like I did in xarray).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shoyer actually, my biggest reason to think about it was making masked array indexing subclassing with oindex/vindex, etc. not a huge pile of odd python code (instead put the pile of odd code to C to make it better reusable ;)).

I understand that the casting part (or replacement of bool arrays, not allowing some odder objects) might not be ideal, but on the other hand, information like "numpy would make the result a copy, a scalar or a view" or expanding the ellipsis might be very nice. The tuple normalization is a rather small thing (it basically checks for a non-array sequence of length <= np.MAXDIMS, and if it contains another sequence, slice or None consider it a tuple).

For classes which do basically the same as numpy, it would seem more useful to actually provide some more information, for other classes I am unsure that providing this hack is actually much useful. Which basically just asks the question whether it is worth putting it in numpy instead of just stealing the normalization code from xarray ;).

@charris
Copy link
Member
charris commented Jan 22, 2017

@eric-wieser Needs rebase.

@eric-wieser eric-wieser force-pushed the refactor-prepare-index branch from ef117bd to 488784b Compare April 7, 2017 21:16
@eric-wieser eric-wieser force-pushed the refactor-prepare-index branch from 488784b to a86cc8c Compare April 8, 2017 09:48
@eric-wieser
Copy link
Member Author

Rebased, and adds a tonne of tests. This shares commits with #8278, so should be merged second


idx = make_broken_sequence(tuple, [1, None, ValueError()])
assert_raises(ValueError, operator.getitem, idx, 2)
assert_raises(ValueError, as_index_tuple, idx)
Copy link
Member Author
@eric-wieser eric-wieser Apr 8, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This and the test above it is a behaviour change - previously, idx.__getitem__ would never be called at all, and idx would be treated as if it were ().

This is because PySequence_Tuple(x) does not actually invoke type(x).__getitem__ if isinstance(x, tuple), presumably falling back on tuple.__getitem__.

Do we want to keep the old behaviour, or is it a bug?

@eric-wieser
Copy link
Member Author

Closing in favor of #9686, which removes the need for this function.

To support only the non-deprecated behavior, it can be trivially written as:

def as_index_tuple(item):
    if not isinstance(item, tuple):
        item = (item,)
    return item

Which is simple enough to leave to the end user.

@godaygo
Copy link
Contributor
godaygo commented Apr 22, 2018

Do not get me wrong - just curiosity, but I always wondered why the use of ternary operator in Python is somewhat ignored?

 def as_index_tuple(item):
    return (item if isinstance(item, tuple) else (item,))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants
0