8000 DEP: Deprecate non-tuple nd-indices by eric-wieser · Pull Request #9686 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

DEP: Deprecate non-tuple nd-indices #9686

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
May 27, 2018
Merged

Conversation

eric-wieser
Copy link
Member
@eric-wieser eric-wieser commented Sep 14, 2017

A reminder of the purpose of this - currently we allow both arr[[None, 0]] and arr[(None, 0)] to mean the same thing, yet arr[[0, 0]] and arr[(0, 0)] mean different things. This makes it super hard to make a compliant subclass, or duck array.

By deprecating this feature, we force downstream library code to stop using it, which in turn makes that library code more compatible with subclasses and duck types.


This reapplies @seberg's #4434, with the following changes:

  • tuple subclasses are not deprecated, because supporting namedtuples seems possibly useful
  • All warnings are FutureWarnings, since determining how np.array(...) will treat a sequence is expensive - it's easier to just say "this will error if np.array(seq) would"

As of #10618, none of our code relies on this "feature"

Copy link
Contributor
@mhvk mhvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most comments just attempt at making a tuple to start with. The many occurrences in our own code do make me wonder slightly whether this is in fact a good idea. I fear the construction of lists consisting of lots of slice(None) plus a real slice is going to be common also outside of numpy...

@@ -69,13 +69,13 @@ def _raw_fft(a, n=None, axis=-1, init_function=fftpack.cffti,
if s[axis] > n:
index = [slice(None)]*len(s)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe nicer to just fix the creation of the index:

index = (slice(None),) * axis + (slice(0, n),)

then, no need to use tuple b 8000 elow.

@@ -21,6 +21,12 @@ New functions
Deprecations
============

* Multidimensional indexing with anything but a base class tuple is
deprecated. This means that code such as ``arr[[slice(None)]]`` has to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about giving an example with a slightly more useful case, e.g., [slice(None), slice(0, 10)] -> tuple(slice(None), slice(0, 10)) (or even omit the tuple in that case).

@@ -126,7 +126,7 @@ def test_arithmetic_drops_references(self):
def test_indexing_drops_references(self):
fp = memmap(self.tmpfp, dtype=self.dtype, mode='w+',
shape=self.shape)
tmp = fp[[(1, 2), (2, 3)]]
tmp = fp[((1, 2), (2, 3))]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, the parenthesis are not necessary, correct?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@@ -69,13 +69,13 @@ def _raw_fft(a, n=None, axis=-1, init_function=fftpack.cffti,
if s[axis] > n:
index = [slice(None)]*len(s)
index[axis] = slice(0, n)
a = a[index]
a = a[tuple(index)]
else:
index = [slice(None)]*len(s)
index[axis] = slice(0, s[axis])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And same here.

index = (slice(None),) * axis + (slice(0, s[axis]),)

@@ -1745,7 +1745,7 @@ def gradient(f, *varargs, **kwargs):
slice4[axis] = slice(2, None)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If up to me, I'd just create a base prefix (slice(None),) * axis here and add the 1,2,3,4 as need below - which I think would make for clearer code. But that is getting well beyond the immediate purpose of this PR, so feel free to ignore.

numpy/ma/core.py Outdated
@@ -5483,6 +5483,7 @@ def sort(self, axis=-1, kind='quicksort', order=None,
else:
idx = list(np.ix_(*[np.arange(x) for x in self.shape]))
idx[axis] = sidx
idx = tuple(idx)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe

idx = np.ix_(*[(sidx if i == axis else np.arange(self.shape[i])) for i in range(self.ndim)])

well, maybe not really better at all.

@@ -723,6 +723,7 @@ def _median(a, axis=None, out=None, overwrite_input=False):
# as median (which is mean of empty slice = nan)
indexer = [slice(None)] * asorted.ndim
indexer[axis] = slice(0, 0)
indexer = tuple(indexer)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indexer = (slice(None),) * axis + (slice(0, 0),)

@@ -784,6 +785,7 @@ def replace_masked(s):
if np.issubdtype(asorted.dtype, np.inexact):
# avoid inf / x = masked
s = np.ma.sum([low, high], axis=0, out=out)
print(repr(s.data))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not intended!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch! Must have been in my working copy or stash or something

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@@ -1657,7 +1659,7 @@ def flatnotmasked_contiguous(a):
if not k:
result.append(slice(i, i + n))
i += n
return result or None
return tuple(result) or None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also update the docstring! (It states a list is returned)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

8000
result.append(flatnotmasked_contiguous(a[idx]) or None)
return result
result.append(flatnotmasked_contiguous(a[tuple(idx)]) or None)
return tuple(result)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again do update docstring.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@charris charris changed the title Rebase of #4434 DEP: Deprecate non-tuple nd-indices Sep 25, 2017
@charris charris added this to the 1.14.0 release milestone Sep 25, 2017
@eric-wieser
Copy link
Member Author
eric-wieser commented Sep 26, 2017

Just some notes from a performance perspective:

67E6
  • until MAINT: Make the refactor suggested in prepare_index #8278, passing lists was probably slower anyway, because you ended up doing a bunch of heuristics and then calling PyTuple_new on the list on the C side.
  • coercing a list to a tuple seems faster than building a tuple from scratch each time:
>>> a = np.zeros((2,)*10)

>>> sln = (slice(None),) # cache this to try and maximize speed
>>> %timeit a[sln*3 + (1,) + sln*6]
933 ns ± 20 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

>>> sl = list(sln * 10)
>>> %timeit sl[3] = 1; a[tuple(sl)]
858 ns ± 14.7 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

>>> %timeit sl[3] = 1; a[sl]  # passing a list is still fastest though
748 ns ± 27.3 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

@seberg
Copy link
Member
seberg commented Sep 26, 2017 via email

@eric-wieser
Copy link
Member Author

Weirdly, writing it out explicitly is slowest of all:

In [42]: %timeit a[:,:,:,1,:,:,:,:,:,:]
985 ns ± 10.1 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

@seberg
Copy link
Member
seberg commented Sep 26, 2017

Dunno, might be unfair since python has to create the slice objects inside the timeit, etc.

@mhvk
Copy link
Contributor
mhvk commented Sep 26, 2017

The performance aspect probably doesn't matter so much; my comments were more about trying to make the code cleaner, with less conversions.

But I must admit that I've wondered why direct list comprehension is so much faster than doing it inside a function:

In [1]: %timeit [_i for _i in range(100)]
100000 loops, best of 3: 5.75 µs per loop

In [2]: %timeit list(_i for _i in range(100))
100000 loops, best of 3: 9.94 µs per loop

In [3]: %timeit tuple(_i for _i in range(100))
100000 loops, best of 3: 10.1 µs per loop

@charris
Copy link
Member
charris commented Oct 18, 2017

Sounds like this might need more discussion.

@charris
Copy link
Member
charris commented Nov 26, 2017

Pushing off to 1.15.

@eric-wieser
Copy link
Member Author

@mhvk: I'd prefer to leave the different tuple generation strategy to a different PR, and keep this change as small as possible.

@eric-wieser eric-wieser force-pushed the force-tuple branch 2 times, most recently from 397c822 to 8f469b3 Compare February 17, 2018 06:24
@eric-wieser eric-wieser changed the base branch from master to maintenance/1.14.x March 16, 2018 01:20
@eric-wieser eric-wieser changed the base branch from maintenance/1.14.x to master March 16, 2018 01:21
"interpreted as an array index, `arr[np.array(seq)]`, "
"which will result either in an error or a different "
"result.") < 0) {
return -1;
Copy link
Member Author
@eric-wieser eric-wieser Mar 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think I might have missed some DECREFs here.

@eric-wieser
Copy link
Member Author

Base switch above to recompute the diff. Shame there's no button to do that.

@eric-wieser
Copy link
Member Author

This maybe needs a doc change to https://docs.scipy.org/doc/numpy/reference/arrays.indexing.html#advanced-indexing

@charris
Copy link
Member
charris commented May 25, 2018

@eric-wieser Want to make that change or can I put this in.

@shoyer
Copy link
Member
shoyer commented Apr 18, 2019

Is there a plan for finishing this deprecation cycle? Would 1.17 be too soon to change this?

@eric-wieser
Copy link
Member Author
eric-wieser commented Apr 18, 2019

My gut feel is to leave this till (at least) 1.18 - this was quite an intrusive deprecation, and the PR references above suggest that downstream packages are still running into it every few months. Perhaps the deprecation NEP has a clear stance here.

Is there some reason you're hoping to get rid of this sooner? My main goal with this PR was to stop users writing code in this style, so I could stop trying to replicate it in my own __getitem__, without actually caring if the change ever happens

@shoyer
Copy link
Member
shoyer commented Apr 18, 2019

Right now, numpy_array[duck_array] can fail in some but not all circumstances to be interpreted as numpy_array[np.asarray(duck_array)]. This resulted in an issue for JAX: jax-ml/jax#620

But I agree, it seems that we should wait a bit longer before changing this.

@eric-wieser
Copy link
Member Author

Do the jax objects implement __array_ufunc__, or some other marker we could use look for meaning "do not use the deprecated behavior"?

@shoyer
Copy link
Member
shoyer commented Apr 18, 2019

They implement __array__ currently, and will soon implement both __array_ufunc__ and __array_function__ (jax-ml/jax#611).

@eric-wieser
Copy link
Member Author

I think the presence of __array__ is probably a good indication that we should not treat them as a tuple. Perhaps add a check for that attribute to shortcut this path, and if the type is not list?

@rgommers
Copy link
Member

Perhaps the deprecation NEP has a clear stance here.

It won't have an opinion on any specific cases. That was a clear message from review: just express the principles not current cases. I'm planning to get back to that NEP soon by the way.

I agree with waiting till 1.18

@eric-wieser
Copy link
Member Author
eric-wieser commented Apr 18, 2019

We might be able to finish the deprecation now for everything except lists, which I think have been all of the downstream cases that mattered.

We could also consider bumping this to an np.VisibleDeprecationWarning to try and flush out remaining downstream projects.

@rgommers
Copy link
Member

We could also consider bumping this to an np.VisibleDeprecationWarning to try and flush out remaining downstream projects.

I like that idea, if we can do that for 1.17 that will probably flush out many more issues for end users.

hawkinsp added a commit to hawkinsp/numpy that referenced this pull request Feb 10, 2022
This behavior has been deprecated since NumPy 1.15
(numpy#9686).
hawkinsp added a commit to hawkinsp/numpy that referenced this pull request Feb 10, 2022
This behavior has been deprecated since NumPy 1.15
(numpy#9686).
hawkinsp added a commit to hawkinsp/numpy that referenced this pull request Feb 10, 2022
This behavior has been deprecated since NumPy 1.15
(numpy#9686).
hawkinsp added a commit to hawkinsp/numpy that referenced this pull request Feb 10, 2022
This behavior has been deprecated since NumPy 1.15
(numpy#9686).
hawkinsp added a commit to hawkinsp/numpy that referenced this pull request Feb 11, 2022
This behavior has been deprecated since NumPy 1.15
(numpy#9686).
hawkinsp added a commit to hawkinsp/numpy that referenced this pull request Feb 11, 2022
This behavior has been deprecated since NumPy 1.15
(numpy#9686).
hawkinsp added a commit to hawkinsp/numpy that referenced this pull request Feb 11, 2022
This behavior has been deprecated since NumPy 1.15
(numpy#9686).
hawkinsp added a commit to hawkinsp/numpy that referenced this pull request Feb 14, 2022
This behavior has been deprecated since NumPy 1.15
(numpy#9686).
melissawm pushed a commit to melissawm/numpy that referenced this pull request Apr 12, 2022
This behavior has been deprecated since NumPy 1.15
(numpy#9686).
seberg pushed a commit to seberg/numpy that referenced this pull request Apr 24, 2022
This behavior has been deprecated since NumPy 1.15
(numpy#9686).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants
0