8000 ENH: Add index_tricks.as_index_tuple by eric-wieser · Pull Request #8276 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

ENH: Add index_tricks.as_index_tuple #8276

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 6 commits into from
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
ENH: Expose prepare_index_tuple as index_tricks.as_index_tuple
Approaches #8275
  • Loading branch information
eric-wieser committed Apr 8, 2017
commit a86cc8c63f417431c27b3c545b784aad9824319a
5 changes: 5 additions & 0 deletions doc/release/1.13.0-notes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -159,6 +159,11 @@ being iterated over.
For consistency with ``ndarray`` and ``broadcast``, ``d.ndim`` is a shorthand
for ``len(d.shape)``.

``as_index_tuple`` function in ``index_tricks``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Exposes the internal normalization that happens when indexing with non-tuple
objects to convert them into tuples. Useful for correctly overriding or
wrapping ndarray.__getitem__

Improvements
============
Expand Down
16 changes: 16 additions & 0 deletions numpy/add_newdocs.py
Original file line number Diff line number Diff line change
Expand Up @@ -5403,6 +5403,22 @@ def luf(lamdaexpr, *args, **kwargs):

""")

add_newdoc('numpy.core.multiarray', 'as_index_tuple',
"""
as_index_tuple(index)

Normalizes an index argument, like that passed to `__getitem__`, into a tuple.
Follows the invariant that `x[index]` is identical to `x[as_index_tuple(index)].
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense, but it doesn't really explain what it does do. There are lots of types of index normalization beyond ensuring you always get a tuple. It would be valuable to describe the precise type of the return value (e.g., can the result include Ellipsis? or lists? or arrays with boolean dtype?).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stephan, you are a subclasser ;). Would having more of the normalization/information available be worth it? I am also a bit thinking again about the new indexing attributes idea in combination with subclasses.
Right now, this does not really change the index much at all, but I mean we could give all information that numpy has after its index preperation step without much hassle. Like that indexer object which is like

indexer = get_indexer(arr, index, maybe_also_allow_single_boolean=False)
arr[index] == indexer(arr)
indexer(arr_with_same_shape)  # works too
indexer.scalar_result
indexer.returns_view
indexer.type == "plain_indexing"

Dunno, you probably saw this suggestion back in the day, but....

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The question is, is there any intermediate result of prepare_index that might in some situations be more useful than the final result? Right now, all I can think of is allowing non-integer np.arrays. Any one able to come up with something else?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess non-int arrays are somewhat thinkable since someone might do something like (texture) interpolation based on floating indices. Other then that, I have my doubts ;). The thing is, you can only go that far and actually maintain numpy compatibility.... It would otherwise quickly become some other kind of indexing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel the only reason you'd want this feature in the first place is to implement some custom form of indexing that diverges from a numpy index. In the past, I've considered:

  • Wraparound indexing of various types
  • interpolation indexing (as you mention)
  • constant offset-indexing (arrays starting at 1! More seriously, useful for modelling discrete time filters starting at -k)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stephan, you are a subclasser ;)

To be clear, I only use duck-array types, no subclasses!

I don't know if this would be directly useful (maybe?) -- it's hard to say without a clear description of exactly what it does! In practice I would guess that most folks who write customized indexing code that works mostly but not exactly like NumPy will roll their own normalization (like I did in xarray).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shoyer actually, my biggest reason to think about it was making masked array indexing subclassing with oindex/vindex, etc. not a huge pile of odd python code (instead put the pile of odd code to C to make it better reusable ;)).

I understand that the casting part (or replacement of bool arrays, not allowing some odder objects) might not be ideal, but on the other hand, information like "numpy would make the result a copy, a scalar or a view" or expanding the ellipsis might be very nice. The tuple normalization is a rather small thing (it basically checks for a non-array sequence of length <= np.MAXDIMS, and if it contains another sequence, slice or None consider it a tuple).

For classes which do basically the same as numpy, it would seem more useful to actually provide some more information, for other classes I am unsure that providing this hack is actually much useful. Which basically just asks the question whether it is worth putting it in numpy instead of just stealing the normalization code from xarray ;).


Examples
--------
>>> as_index_tuple(1)
(1,)
>>> as_index_tuple([1, 2, None])
(1, 2, None)
>>> as_index_tuple([1, 2, 3])
([1, 2, 3])
""")

##############################################################################
#
Expand Down
34 changes: 34 additions & 0 deletions numpy/core/src/multiarray/mapping.c
Original file line number Diff line number Diff line change
Expand Up @@ -284,6 +284,40 @@ prepare_index_tuple(PyObject *index, PyObject **result)
}
}

/**
* Expose prepare_index_tuple to python code
*/
NPY_NO_EXPORT PyObject *
as_index_tuple(PyObject *NPY_UNUSED(self), PyObject *args)
{
PyObject *obj;
PyObject *result;
PyObject *prepared[NPY_MAXDIMS*2];
npy_intp i, n;

if (!PyArg_ParseTuple(args, "O", &obj)) {
return NULL;
}
n = prepare_index_tuple(obj, prepared);
if (n < 0) {
return NULL;
}

result = PyTuple_New(n);
if (result == NULL) {
return NULL;
}

for (i = 0; i < n; i++) {
PyObject *val = prepared[i];
Py_INCREF(val);
PyTuple_SET_ITEM(result, i, val);
}

return result;
}


/**
* Prepare an npy_index_object from the python slicing object.
*
Expand Down
3 changes: 3 additions & 0 deletions numpy/core/src/multiarray/mapping.h
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,9 @@ array_subscript(PyArrayObject *self, PyObject *op);
NPY_NO_EXPORT int
array_assign_item(PyArrayObject *self, Py_ssize_t i, PyObject *v);

NPY_NO_EXPORT PyObject *
as_index_tuple(PyObject *NPY_UNUSED(self), PyObject *args);

/*
* Prototypes for Mapping calls --- not part of the C-API
* because only useful as part of a getitem call.
Expand Down
3 changes: 3 additions & 0 deletions numpy/core/src/multiarray/multiarraymodule.c
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,7 @@ NPY_NO_EXPORT int NPY_NUMUSERTYPES = 0;
#include "templ_common.h" /* for npy_mul_with_overflow_intp */
#include "compiled_base.h"
#include "mem_overlap.h"
#include "mapping.h" /* for as_index_tuple */

/* Only here for API compatibility */
NPY_NO_EXPORT PyTypeObject PyBigArray_Type;
Expand Down Expand Up @@ -4293,6 +4294,8 @@ static struct PyMethodDef array_module_methods[] = {
METH_VARARGS | METH_KEYWORDS, NULL},
{"normalize_axis_index", (PyCFunction)normalize_axis_index,
METH_VARARGS | METH_KEYWORDS, NULL},
{"as_index_tuple", (PyCFunction)as_index_tuple,
METH_VARARGS, NULL},
{NULL, NULL, 0, NULL} /* sentinel */
};

Expand Down
92 changes: 92 additions & 0 deletions numpy/core/tests/test_indexing.py
Original file line number Diff line number Diff line change
Expand Up @@ -511,6 +511,98 @@ def test_indexing_array_negative_strides(self):
arr[slices] = 10
assert_array_equal(arr, 10.)

def test_as_index_tuple(self):
from numpy.core.multiarray import as_index_tuple

arr = np.array([1])
sl = np.s_[:]
ell = Ellipsis
obj = object()

# scalars are wrapped in a 1-tuple
assert_equal(as_index_tuple(1), (1,))
assert_equal(as_index_tuple(ell), (ell,))
assert_equal(as_index_tuple(None), (None,))
assert_equal(as_index_tuple(sl), (sl,))
assert_equal(as_index_tuple(arr), (arr,))
assert_equal(as_index_tuple(obj), (obj,))

# tuples are untouched
assert_equal(as_index_tuple((1, 2, 3)), (1, 2, 3))
assert_equal(as_index_tuple((1, 2, ell)), (1, 2, ell))
assert_equal(as_index_tuple((1, 2, None)), (1, 2, None))
assert_equal(as_index_tuple((1, 2, sl)), (1, 2, sl))
assert_equal(as_index_tuple((1, 2, arr)), (1, 2, arr))

# sequences of scalars are wrapped
assert_equal(as_index_tuple([1, 2, 3]), ([1, 2, 3],))

# sequences containing slice objects or ellipses are tuple-ified
assert_equal(as_index_tuple([1, 2, ell]), (1, 2, ell))
assert_equal(as_index_tuple([1, 2, None]), (1, 2, None))
assert_equal(as_index_tuple([1, 2, sl]), (1, 2, sl))
assert_equal(as_index_tuple([1, 2, arr]), (1, 2, arr))

# unless they are >= np.MAXDIMS, in which case they are always wrapped
nd = np.MAXDIMS
assert_equal(as_index_tuple(nd * [1]), (nd * [1],))
assert_equal(as_index_tuple(nd * [ell]), (nd * [ell],))
assert_equal(as_index_tuple(nd * [None]), (nd * [None],))
assert_equal(as_index_tuple(nd * [sl]), (nd * [sl],))
assert_equal(as_index_tuple(nd * [arr]), (nd * [arr],))

def test_as_index_tuple_broken_getitem(self):
from numpy.core.multiarray import as_index_tuple

# test sequences with a broken __getitem__
def make_broken_sequence(base, items):
class Broken(base):
def __len__(self):
return len(items)
def __getitem__(self, i):
val = items[i]
if isinstance(val, Exception):
raise val
return val
return Broken()

# error comes first, so just treat as a scalar
idx = make_broken_sequence(object, [1, ValueError(), None])
assert_raises(ValueError, operator.getitem, idx, 1)
assert_equal(as_index_tuple(idx), (idx,))

# none comes first, so commit to making the tuple
idx = make_broken_sequence(object, [1, None, ValueError()])
assert_raises(ValueError, operator.getitem, idx, 2)
assert_raises(ValueError, as_index_tuple, idx)

# tuples subclasses error in both cases
idx = make_broken_sequence(tuple, [1, ValueError(), None])
assert_raises(ValueError, operator.getitem, idx, 1)
assert_raises(ValueError, as_index_tuple, idx)

idx = make_broken_sequence(tuple, [1, None, ValueError()])
assert_raises(ValueError, operator.getitem, idx, 2)
assert_raises(ValueError, as_index_tuple, idx)
Copy link
Member Author
@eric-wieser eric-wieser Apr 8, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This and the test above it is a behaviour change - previously, idx.__getitem__ would never be called at all, and idx would be treated as if it were ().

This is because PySequence_Tuple(x) does not actually invoke type(x).__getitem__ if isinstance(x, tuple), presumably falling back on tuple.__getitem__.

Do we want to keep the old behaviour, or is it a bug?


def test_as_index_tuple_broken_len(self):
from numpy.core.multiarray import as_index_tuple

def make_badlen_sequence(base):
class cls(base):
def __len__(self): raise ValueError
def __getitem__(self, i): raise IndexError
return cls()

idx = make_badlen_sequence(object)
assert_raises(ValueError, len, idx)
assert_equal(as_index_tuple(idx), (idx,))

idx = make_badlen_sequence(tuple)
assert_raises(ValueError, len, idx)
assert_raises(ValueError, as_index_tuple, idx)


class TestFieldIndexing(TestCase):
def test_scalar_return_type(self):
# Field access on an array should return an array, even if it
Expand Down
2 changes: 1 addition & 1 deletion numpy/lib/index_tricks.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
from . import function_base
import numpy.matrixlib as matrix
from .function_base import diff
from numpy.core.multiarray import ravel_multi_index, unravel_index
from numpy.core.multiarray import ravel_multi_index, unravel_index, as_index_tuple
from numpy.lib.stride_tricks import as_strided

makemat = matrix.matrix
Expand Down
0