-
-
Notifications
You must be signed in to change notification settings - Fork 10.9k
ENH: Add index_tricks.as_index_tuple #8276
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
numpy/core/src/multiarray/mapping.c
Outdated
index_ndim = 1; | ||
/* If the index is not a tuple, conver it into (index,) */ | ||
if (!make_tuple && !PyTuple_CheckExact(index)) { | ||
index_as_tuple = Py_BuildValue("(O)", index); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the more inefficient line. If this is unacceptable, then some extra arguments could enable/disable this behaviour. I would like this to stay in some form though, so that it can be exposed python-side
700a7ac
to
c575667
Compare
Fixed in 41bae74. Knew I wouldn't get refcounting right first time! |
I have on question. What about recreating the tuple from the prepared index? Not because that saves the tuple creation otherwise (I thought i create the tuple in any case ;)). But because, given an input array, you will get a lot of other information out of it such as expanding the Ellipsis. Though I guess conversion to The question is a bit what the use case is. Just that weird list to tuple hack that we would almost like to get rid off? |
I agree, this seems like a good idea. But as you say, it may be desirable to allow normally-forbidden array types to pass through, so there's the danger of doing more than the user wants, and having them resort to doing the whole thing from scratch. Perhaps
Right now, it's to replace these (incorrect) lines of code which start any overload of def __getitem__(self, index):
if not isinstance(index, tuple):
index = (index,)
# do some manipulation of `index`
return super().__getitem__(index)
Now that this is refactored out, I'd be in favor of dropping a DeprecationWarning in this list-to-tuple hack, since it seems to be left over under the guise of compatibility with Would you like me to split this PR into the refactor and the python interface, such that the former can be merged and the latter discussed further? |
I already have a very old PR adding that deprecation I think. But it did not do any refactoring (on the downside, I will have to do annoying merging for the oindex stuff but I can live with it, and don't have time to really pursue it anyway). Note that it will be a long deprecation and you might want to check my PR if you want to start it, because it already includes a lot of the necessary test fixups as well. The problem is that lists as tuples are maybe annoying, but a very common thing when building an index, so they are also very common downstream. If you are more interested in subclassing and indexing.... The biggest problem with adding oindex, etc. is for me how exactly to handle subclasses. If we can figure that out we could actually push it forward, so if you have energy/ideas there.... |
@eric-wieser the thing is.... What kind of manipulation do you do? Would that manipulation not possibly necessacitate more information which would be much easier available with the other information? Similar to the EDIT: I had (maybe its still in the branch) some kind of
|
Can you link me to some reference on what And are you in favor of me splitting this PR? |
@eric-wieser, gladly ;).
|
The indexer thing seems currently in the oindex PR, but I guess all thoughts about it were only in some mails to the list. |
69ef1b4
to
ef117bd
Compare
I've gone ahead and split this PR, and squashed all the fixup commits |
☔ The latest upstream changes (presumably #8287) made this pull request unmergeable. Please resolve the merge conflicts. 8000 |
as_index_tuple(index) | ||
|
||
Normalizes an index argument, like that passed to `__getitem__`, into a tuple. | ||
Follows the invariant that `x[index]` is identical to `x[as_index_tuple(index)]. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense, but it doesn't really explain what it does do. There are lots of types of index normalization beyond ensuring you always get a tuple. It would be valuable to describe the precise type of the return value (e.g., can the result include Ellipsis? or lists? or arrays with boolean dtype?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stephan, you are a subclasser ;). Would having more of the normalization/information available be worth it? I am also a bit thinking again about the new indexing attributes idea in combination with subclasses.
Right now, this does not really change the index much at all, but I mean we could give all information that numpy has after its index preperation step without much hassle. Like that indexer object which is like
indexer = get_indexer(arr, index, maybe_also_allow_single_boolean=False)
arr[index] == indexer(arr)
indexer(arr_with_same_shape) # works too
indexer.scalar_result
indexer.returns_view
indexer.type == "plain_indexing"
Dunno, you probably saw this suggestion back in the day, but....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The question is, is there any intermediate result of prepare_index
that might in some situations be more useful than the final result? Right now, all I can think of is allowing non-integer np.array
s. Any one able to come up with something else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess non-int arrays are somewhat thinkable since someone might do something like (texture) interpolation based on floating indices. Other then that, I have my doubts ;). The thing is, you can only go that far and actually maintain numpy compatibility.... It would otherwise quickly become some other kind of indexing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel the only reason you'd want this feature in the first place is to implement some custom form of indexing that diverges from a numpy index. In the past, I've considered:
- Wraparound indexing of various types
- interpolation indexing (as you mention)
- constant offset-indexing (arrays starting at 1! More seriously, useful for modelling discrete time filters starting at -k)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stephan, you are a subclasser ;)
To be clear, I only use duck-array types, no subclasses!
I don't know if this would be directly useful (maybe?) -- it's hard to say without a clear description of exactly what it does! In practice I would guess that most folks who write customized indexing code that works mostly but not exactly like NumPy will roll their own normalization (like I did in xarray).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shoyer actually, my biggest reason to think about it was making masked array indexing subclassing with oindex/vindex, etc. not a huge pile of odd python code (instead put the pile of odd code to C to make it better reusable ;)).
I understand that the casting part (or replacement of bool arrays, not allowing some odder objects) might not be ideal, but on the other hand, information like "numpy would make the result a copy, a scalar or a view" or expanding the ellipsis might be very nice. The tuple normalization is a rather small thing (it basically checks for a non-array sequence of length <= np.MAXDIMS, and if it contains another sequence, slice or None consider it a tuple).
For classes which do basically the same as numpy, it would seem more useful to actually provide some more information, for other classes I am unsure that providing this hack is actually much useful. Which basically just asks the question whether it is worth putting it in numpy instead of just stealing the normalization code from xarray ;).
@eric-wieser Needs rebase. |
This eliminates all tuple constructors, in favor of working with a C array
ef117bd
to
488784b
Compare
488784b
to
a86cc8c
Compare
Rebased, and adds a tonne of tests. This shares commits with #8278, so should be merged second |
|
||
idx = make_broken_sequence(tuple, [1, None, ValueError()]) | ||
assert_raises(ValueError, operator.getitem, idx, 2) | ||
assert_raises(ValueError, as_index_tuple, idx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This and the test above it is a behaviour change - previously, idx.__getitem__
would never be called at all, and idx
would be treated as if it were ()
.
This is because PySequence_Tuple(x)
does not actually invoke type(x).__getitem__
if isinstance(x, tuple)
, presumably falling back on tuple.__getitem__
.
Do we want to keep the old behaviour, or is it a bug?
Closing in favor of #9686, which removes the need for this function. To support only the non-deprecated behavior, it can be trivially written as: def as_index_tuple(item):
if not isinstance(item, tuple):
item = (item,)
return item Which is simple enough to leave to the end user. |
Do not get me wrong - just curiosity, but I always wondered why the use of ternary operator in Python is somewhat ignored? def as_index_tuple(item):
return (item if isinstance(item, tuple) else (item,)) |
Expose the function added in #8278 to help with #8275.
This was motivated by some confusion about the trigger for advanced indexing
Are these functions in the right places?