8000 ENH: Add index_tricks.as_index_tuple by eric-wieser · Pull Request #8276 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

Conversation

eric-wieser
Copy link
Member
@eric-wieser eric-wieser commented Nov 14, 2016

Expose the function added in #8278 to help with #8275.
This was motivated by some confusion about the trigger for advanced indexing

Are these functions in the right places?

< 8000 /div>

Copy link
Member Author
@eric-wieser eric-wieser Nov 14, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the more inefficient line. If this is unacceptable, then some extra arguments could enable/disable this behaviour. I would like this to stay in some form though, so that it can be exposed python-side

@eric-wieser eric-wieser force-pushed the refactor-prepare-index branch 4 times, most recently from 700a7ac to c575667 Compare November 15, 2016 00:29
@eric-wieser
Copy link
Member Author
eric-wieser commented Nov 15, 2016

Hmm, that last commit introduced the error Fatal Python error: ../Objects/codeobject.c:367 object at 0x7f6c2b868850 has negative ref count '0x2424242424242426' 8000 m, but only on travis

Fixed in 41bae74. Knew I wouldn't get refcounting right first time!

@seberg
Copy link
Member
seberg commented Nov 15, 2016

I have on question. What about recreating the tuple from the prepared index? Not because that saves the tuple creation otherwise (I thought i create the tuple in any case ;)). But because, given an input array, you will get a lot of other information out of it such as expanding the Ellipsis. Though I guess conversion to intp arrays may not be what you want.

The question is a bit what the use case is. Just that weird list to tuple hack that we would almost like to get rid off?

@eric-wieser
Copy link
Member Author
eric-wieser commented Nov 15, 2016

But because, given an input array, you will get a lot of other information out of it such as expanding the Ellipsis. Though I guess conversion to intp arrays may not be what you want.

I agree, this seems like a good idea. But as you say, it may be desirable to allow normally-forbidden array types to pass through, so there's the danger of doing more than the user wants, and having them resort to doing the whole thing from scratch.

Perhaps prepare_index should be more lenient in what it accepts, and then the default array_subscript should impose the tighter restrictions

The question is a bit what the use case is.

Right now, it's to replace these (incorrect) lines of code which start any overload of __getitem__:

def __getitem__(self, index):
    if not isinstance(index, tuple):
        index = (index,)

    # do some manipulation of `index`

    return super().__getitem__(index)

we would almost like to get rid of?

Now that this is refactored out, I'd be in favor of dropping a DeprecationWarning in this list-to-tuple hack, since it seems to be left over under the guise of compatibility with numeric


Would you like me to split this PR into the refactor and the python interface, such that the former can be merged and the latter discussed further?

@seberg
Copy link
Member
seberg commented Nov 15, 2016

I already have a very old PR adding that deprecation I think. But it did not do any refactoring (on the downside, I will have to do annoying merging for the oindex stuff but I can live with it, and don't have time to really pursue it anyway).

Note that it will be a long deprecation and you might want to check my PR if you want to start it, because it already includes a lot of the necessary test fixups as well. The problem is that lists as tuples are maybe annoying, but a very common thing when building an index, so they are also very common downstream.

If you are more interested in subclassing and indexing.... The biggest problem with adding oindex, etc. is for me how exactly to handle subclasses. If we can figure that out we could actually push it forward, so if you have energy/ideas there....

@seberg
Copy link
Member
seberg commented Nov 15, 2016

@eric-wieser the thing is.... What kind of manipulation do you do? Would that manipulation not possibly necessacitate more information which would be much easier available with the other information? Similar to the __numpy_getitem__ passing in things like "a copy will be made", as I once thought about for oindex, etc. But there were some other problems. I don't have time to think about all the things right now, just want to note that there may be other approaches, and we should think about it a little before implementing this hastily and noticing later that more would be better/needed if we ever do oindex/vindex.

EDIT: I had (maybe its still in the branch) some kind of indexer object, which had some extra information about the index and could be applied to a base array like

def __numpy_getitem__(self, indexer):
    print(indexer.copy)
    print(indexer.scalar)
    return wrap(indexer(self.data))

@eric-wieser
Copy link
Member Author
eric-wieser commented Nov 15, 2016

Can you link me to some reference on what oindex is?

And are you in favor of me splitting this PR?

@seberg
Copy link
Member
seberg commented Nov 15, 2016

@eric-wieser, gladly ;).

@seberg
Copy link
Member
seberg commented Nov 15, 2016

The indexer thing seems currently in the oindex PR, but I guess all thoughts about it were only in some mails to the list.

@eric-wieser eric-wieser changed the title Make the refactor suggested in prepare_index ENH: Add index_tricks.as_index_tuple Nov 15, 2016
@eric-wieser eric-wieser force-pushed the refactor-prepare-index branch from 69ef1b4 to ef117bd Compare November 15, 2016 13:20
@eric-wieser
Copy link
Member Author

I've gone ahead and split this PR, and squashed all the fixup commits

@homu
Copy link
Contributor
homu commented Nov 17, 2016

☔ The latest upstream changes (presumably #8287) made this pull request unmergeable. Please resolve the merge conflicts.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense, but it doesn't really explain what it does do. There are lots of types of index normalization beyond ensuring you always get a tuple. It would be valuable to describe the precise type of the return value (e.g., can the result include Ellipsis? or lists? or arrays with boolean dtype?).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stephan, you are a subclasser ;). Would having more of the normalization/information available be worth it? I am also a bit thinking again about the new indexing attributes idea in combination with subclasses.
Right now, this does not really change the index much at all, but I mean we could give all information that numpy has after its index preperation step without much hassle. Like that indexer object which is like

indexer = get_indexer(arr, index, maybe_also_allow_single_boolean=False)
arr[index] == indexer(arr)
indexer(arr_with_same_shape)  # works too
indexer.scalar_result
indexer.returns_view
indexer.type == "plain_indexing"

Dunno, you probably saw this suggestion back in the day, but....

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The question is, is there any intermediate result of prepare_index that might in some situations be more useful than the final result? Right now, all I can think of is allowing non-integer np.arrays. Any one able to come up with something else?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess non-int arrays are somewhat thinkable since someone might do something like (texture) interpolation based on floating indices. Other then that, I have my doubts ;). The thing is, you can only go that far and actually maintain numpy compatibility.... It would otherwise quickly become some other kind of indexing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel the only reason you'd want this feature in the first place is to implement some custom form of indexing that diverges from a numpy index. In the past, I've considered:

  • Wraparound indexing of various types
  • interpolation indexing (as you mention)
  • constant offset-indexing (arrays starting at 1! More seriously, useful for modelling discrete time filters starting at -k)

Copy link
Member