-
-
Notifications
You must be signed in to change notification settings - Fork 10.9k
ENH: ndrange, like range, but multidimensional #12094
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
78fe0c4
to
f4a8950
Compare
64fcf43
to
109aef9
Compare
This should go to the mailing list for more general discussion. Please describe there the motivation and a general overview of the way you chose to implement it. |
7973163
to
e60bd10
Compare
I'm not sure that |
interesting idea, so the syntax would be:
I've been debating getting rid of the order parameter all together. I included it because conceptually, some things are easier if you "do them along the first column first". Edit:
to iterate in F order |
Not sure if we need a release note, but this is a draft
|
And here is the patch for fotran ordering. I no longer think it is useful, but maybe somebody else wants a go at it. The tests might be useful. |
numpy/lib/index_tricks.py
Outdated
return all(i in r for i, r in zip(index, self._ranges)) | ||
|
||
def __getitem__(self, sl): | ||
# TODO: what is the correct way to handle non-tuple inputs? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The behavior here seems good. But maybe we could add a nicer error message? Currently if you pass in a list, you get:
>>> np.ndrange((1,2,3))[[0,0,0]]
TypeError: range indices must be integers or slices, not list
It would be nice if that said something about how the index can be a tuple.
Related: #9686. We want to encourage people to index with tuples instead of lists.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know how to get the not list
part. Checking for types in python is hard....
numpy/lib/index_tricks.py
Outdated
""" | ||
An N-dimensional range object that returns an iterator (or reversed | ||
iterator) tthat can produce a sequence of tuples from | ||
start (inclusive tuple) to stop (exclusive tuple) by step. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some of this isn't quite accurate. It doesn't return an iterator, rather, it can be iterated.
What about more closely adapting the range
docstring?
There was a problem hiding this comment.
Choose a reason for 8000 hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried. Let me know what you think.
numpy/lib/index_tricks.py
Outdated
|
||
Unlike ``ndindex``, ``ndrange`` is not an iterator. | ||
You should prefer this ``ndrange`` object over ``ndindex`` as ``ndrange`` | ||
allows for multi-dimensional slicing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this comment belongs in the nditer docstring, but isn't needed in the ndrange one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok. adapted.
Did this ever reach the mailing list as an API change? |
lets see how well 6 year old code survives.... |
38e3c75
to
9902492
Compare
I would love to run my benchmark again in 2024, but I keep running into:
I installed my environment with:
tips appreciated |
I used the power of copy and paste to obtain the following local benchmarks: In [11]: In [3]: %%timeit
...: ...: for it in np.ndrange((100, 100)):
...: ...: pass
...:
143 μs ± 2.51 μs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
In [12]:
...: In [4]: %%timeit
...: ...: for it in np.ndindex((100, 100)):
...: ...: pass
...: ...:
1.15 ms ± 16.4 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
In [13]:
In [13]: np.__version__
'2.1.1'
In [14]: import sys
In [15]: sys.version
'3.10.14 | packaged by conda-forge | (main, Mar 20 2024, 12:45:18) [GCC 12.3.0]' |
23ffe4c
to
69f83af
Compare
|
||
Notes | ||
----- | ||
.. versionadded:: 2.2.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this correct??
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, assuming we review and merge before the 2.2 release :)
f4e868c
to
49055b5
Compare
And green |
@ahaldane @shoyer @eric-wieser as people involved in the original mailing list thread, any thoughts? |
In the many years since that mailing list discussion, I've gotten a bit more conservative in my taste for API design. It is still not clear to me who would use this feature. It feels like a solution that exists for the sake of "elegance" or "symmetry," not actual user needs. If the intended users are library authors doing indexing manipulations for arrays, then @asmeurer's ndindex is a much more complete solution. |
There are two places where we use it:
I understand if this feels niche. I can try to see how I can take the concepts of here and to make PRs to improve the performance of ndindex. Historically python 2.7 made it difficult. I couldn’t figure out how to allow for slicing and the speed up’s without breaking functionality, so I proposed ndrange. we use it to iterate over specific dimensions of our xarray all the time. |
Looking at the old discussion, this again seems like a hard PR to decide what to do with. I think a more active maintainer should make a decision BDFL-style. I'd be 60/40 in favor of accepting. The downside is the clutter of having two functions for largely the same thing, kind of like @shoyer pointed to the quantsight nditer project, which is new since this PR was opened. There's a lot of overlap, but while that project looks great it doesn't seem to have exactly the performance or convenience of an "nd version of python3's range" like ndindex or ndrange here. |
I’ll close this leaving it as a good idea for one user of jumpy. If an other user thinks it is good and causes this discussion to be revived. Then great! If nditer or an other project takes off. All the better. Keeping this as an internal pure python function in our own codebase is also fine. Thank you all for your time and review. I’m much more excited about spending our collective on performance improvement ideas I have ;) |
Fair enough. It's a nice idea on its own, too bad ndindex needs backcompat. If you have enough interest, you might consider proposing it to the quantsight project, or just put it up on github as a standalone. That might show people using it enough to upstream here. |
I've been using numpy arrays as a way to store collections of items in 2D. Mostly because of the powerful slicing numpy offers.
It became useful to iterate through the collection in various ways, often wanting to use pythonic tools like
range
.I wrote what I think is a cool
ndrange
class that behaves likerange
(I hope) but in ND.If this is in the scope of Numpy, I would appreciate feedback on how this might be merged in.
Performance concerns
@ahaldane asked on the mailing list if it would be better to implement this on top of nditer like
ndindex
is currently implemented for performance reasons.It seems that
itertools.product + range
(proposed ndrange implementation) is faster thanndinter + multi_index
(current ndindex implementation)It seems that
ndindex
can be sped up by introducingThis seems like a bad optimization for CPython though.
More details can be found here https://gist.github.com/hmaarrfk/a273b3d77cbec6e7b0d1c4f33ea65dd0
Previous attempts
I tried to extend the ndindex class, but it just became very ugly,
https://github.com/hmaarrfk/numpy/pull/1/files#diff-1bd953557a98073031ce66d05dbde3c8R661
and containment was strange because
ndindex
could be consumed itself. What would it mean to check if (0, 0, 0) was inndindex
after a few items had been consumed.This just won't fly in python 2 because range or xrange objects cannot be sliced into the way they are in python 3.I enabled python 2.7 support, but it isn't as elegant as 3.X support.Todo:
step
cannot be provided as a parameter whenstart
andstop
aren't both specified.ndrange(5, step=2)
should be illegal.Fixes: #6393
12094-before_rebase.patch.txt
Patch from 2018 before rebaseThe conflicts were minor, i just rebased...