8000 ENH: ndrange, like range, but multidimensional by hmaarrfk · Pull Request #12094 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

ENH: ndrange, like range, but multidimensional #12094

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

hmaarrfk
Copy link
Contributor
@hmaarrfk hmaarrfk commented Oct 5, 2018

I've been using numpy arrays as a way to store collections of items in 2D. Mostly because of the powerful slicing numpy offers.

It became useful to iterate through the collection in various ways, often wanting to use pythonic tools like range.

I wrote what I think is a cool ndrange class that behaves like range (I hope) but in ND.

If this is in the scope of Numpy, I would appreciate feedback on how this might be merged in.

Performance concerns

@ahaldane asked on the mailing list if it would be better to implement this on top of nditer like ndindex is currently implemented for performance reasons.

It seems that itertools.product + range (proposed ndrange implementation) is faster than ndinter + multi_index (current ndindex implementation)

In [3]: %%timeit 
   ...: for it in np.ndrange((100, 100)): 
   ...:     pass 
   ...:      
   ...:                                                                         
238 µs ± 1.63 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [4]: %%timeit 
   ...: for it in np.ndindex((100, 100)): 
   ...:     pass 
   ...:      
   ...:                                                                         
3.67 ms ± 44.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

It seems that ndindex can be sped up by introducing

    def __init__(self, *shape):
        # [...]
        # defining a generator expression here as opposed to doing this
        # in the __next__ seems to speed ndindex by a factor of 1.7
        self._index_iter = (self._it.multi_index for _ in self._it)
    def __next__(self):
         return next(self._index_iter)

This seems like a bad optimization for CPython though.
More details can be found here https://gist.github.com/hmaarrfk/a273b3d77cbec6e7b0d1c4f33ea65dd0

Previous attempts

I tried to extend the ndindex class, but it just became very ugly,
https://github.com/hmaarrfk/numpy/pull/1/files#diff-1bd953557a98073031ce66d05dbde3c8R661
and containment was strange because ndindex could be consumed itself. What would it mean to check if (0, 0, 0) was in ndindex after a few items had been consumed.

This just won't fly in python 2 because range or xrange objects cannot be sliced into the way they are in python 3. I enabled python 2.7 support, but it isn't as elegant as 3.X support.

Todo:

  • Add docstrings
  • harden tests for failers
  • decide on how to handle lists as inputs
  • Remove development comments
  • Finish the rest of testing for the properties of ndrange
  • Add tests for ravel
  • Are flat and ravel required to return standard numpy classes?
  • Squash the PRs into ENH, TST (once the fate of ravel and flat is determined).
  • Enforce that step cannot be provided as a parameter when start and stop aren't both specified. ndrange(5, step=2) should be illegal.

Fixes: #6393
12094-before_rebase.patch.txt

Patch from 2018 before rebase The conflicts were minor, i just rebased...

@hmaarrfk hmaarrfk force-pushed the ndrange branch 3 times, most recently from 78fe0c4 to f4a8950 Compare October 5, 2018 17:00
@hmaarrfk hmaarrfk force-pushed the ndrange branch 3 times, most recently from 64fcf43 to 109aef9 Compare October 6, 2018 05:46
@hmaarrfk hmaarrfk changed the title WIP: Feedback request: FEAT: ndrange, like range, but multidimensiontal WIP: Feedback request: FEAT: Py3 only: ndrange, like range, but multidimensiontal Oct 6, 2018
@hmaarrfk hmaarrfk changed the title WIP: Feedback request: FEAT: Py3 only: ndrange, like range, but multidimensiontal WIP: Feedback request: FEAT: ndrange, like range, but multidimensiontal Oct 7, 2018
@hmaarrfk hmaarrfk changed the title WIP: Feedback request: FEAT: ndrange, like range, but multidimensiontal FEAT: ndrange, like range, but multidimensiontal Oct 7, 2018
@mattip mattip changed the title FEAT: ndrange, like range, but multidimensiontal ENH: ndrange, like range, but multidimensiontal Oct 7, 2018
@mattip
Copy link
Member
mattip commented Oct 7, 2018

This should go to the mailing list for more general discussion. Please describe there the motivation and a general overview of the way you chose to implement it.

@hmaarrfk hmaarrfk changed the title ENH: ndrange, like range, but multidimensiontal WIP: ENH: ndrange, like range, but multidimensiontal Oct 7, 2018
@hmaarrfk hmaarrfk changed the title WIP: ENH: ndrange, like range, but multidimensiontal ENH: ndrange, like range, but multidimensiontal Oct 7, 2018
@hmaarrfk hmaarrfk force-pushed the ndrange branch 2 times, most recently from 7973163 to e60bd10 Compare October 7, 2018 15:32
@eric-wieser
Copy link
Member

I'm not sure that ndrange should have a concept of an order - that sounds like a property of its iterator, just like a list object does not have a reversed attribute.

@hmaarrfk
Copy link
Contributor Author
hmaarrfk commented Oct 8, 2018

interesting idea, so the syntax would be:

for i in iter(ndrange(arr.shape)[::2], order='F'):
    print(i)

I've been debating getting rid of the order parameter all together. I included it because conceptually, some things are easier if you "do them along the first column first".

Edit:
This syntax doesn't even work. Oh well... I'm slowly growing in favour of just not having order and simply letting people do this:

for i in np.ndrange(arr.shape[::-1]):
    i = i[::-1]

to iterate in F order

@hmaarrfk
Copy link
Contributor Author

Not sure if we need a release note, but this is a draft

``ndrange`` a multi-dimensional range-like object
-------------------------------------------------

``ndrange`` now supersedes the ``ndindex`` object for generating multi-index
iterators. ``ndrange`` behaves much like the Python 3 ``range`` object and
allows multi-dimensional slicing, iteration, reversal and efficient containment
lookup. ``ndindex`` will continue to be maintained for backward compatibility.

@hmaarrfk
Copy link
Contributor Author

And here is the patch for fotran ordering. I no longer think it is useful, but maybe somebody else wants a go at it. The tests might be useful.

ndrange_fortran_ordering_patch_and_tests.patch.txt

return all(i in r for i, r in zip(index, self._ranges))

def __getitem__(self, sl):
# TODO: what is the correct way to handle non-tuple inputs?
Copy link
Member
@ahaldane ahaldane Oct 10, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The behavior here seems good. But maybe we could add a nicer error message? Currently if you pass in a list, you get:

>>> np.ndrange((1,2,3))[[0,0,0]]
TypeError: range indices must be integers or slices, not list

It would be nice if that said something about how the index can be a tuple.

Related: #9686. We want to encourage people to index with tuples instead of lists.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know how to get the not list part. Checking for types in python is hard....

"""
An N-dimensional range object that returns an iterator (or reversed
iterator) tthat can produce a sequence of tuples from
start (inclusive tuple) to stop (exclusive tuple) by step.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of this isn't quite accurate. It doesn't return an iterator, rather, it can be iterated.

What about more closely adapting the range docstring?

Copy link
Contributor Author

Choose a reason for 8000 hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried. Let me know what you think.


Unlike ``ndindex``, ``ndrange`` is not an iterator.
You should prefer this ``ndrange`` object over ``ndindex`` as ``ndrange``
allows for multi-dimensional slicing.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this comment belongs in the nditer docstring, but isn't needed in the ndrange one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok. adapted.

@mattip
Copy link
Member
mattip commented Sep 4, 2024

Did this ever reach the mailing list as an API change?

@mattip
Copy link
Member
mattip commented Sep 4, 2024

@hmaarrfk
Copy link
Contributor Author
hmaarrfk commented Sep 4, 2024

lets see how well 6 year old code survives....

@hmaarrfk hmaarrfk force-pushed the ndrange branch 2 times, most recently from 38e3c75 to 9902492 Compare September 4, 2024 20:29
@hmaarrfk
Copy link
Contributor Author
hmaarrfk commented Sep 4, 2024

I would love to run my benchmark again in 2024, but I keep running into:

Using pip 24.2 from /home/mark/miniforge3/envs/np/lib/python3.12/site-packages/pip (python 3.12)
Non-user install because site-packages writeable
Created temporary directory: /tmp/pip-build-tracker-5gzlhngk
Initialized build tracking at /tmp/pip-build-tracker-5gzlhngk
Created build tracker: /tmp/pip-build-tracker-5gzlhngk
Entered build tracker: /tmp/pip-build-tracker-5gzlhngk
Created temporary directory: /tmp/pip-install-6hyb4gpy
Created temporary directory: /tmp/pip-ephem-wheel-cache-579d6re2
Obtaining file:///home/mark/git/numpy
  Added file:///home/mark/git/numpy to build tracker '/tmp/pip-build-tracker-5gzlhngk'
  Running command Checking if build backend supports build_editable
  Checking if build backend supports build_editable ... done
  Created temporary directory: /tmp/pip-modern-metadata-z_g4s_0q
  Running command Preparing editable metadata (pyproject.toml)

  meson-python: error: Could not find meson version 0.63.3 or newer, found .
  error: subprocess-exited-with-error

I installed my environment with:

# using conda-forge.
mamba create --name np python=3.12 meson-python ninja pkg-config python-build cytho
n libblas libcblas liblapack  compilers

tips appreciated

@hmaarrfk
Copy link
Contributor Author
hmaarrfk commented Sep 4, 2024

I used the power of copy and paste to obtain the following local benchmarks:

In [11]: In [3]: %%timeit
    ...:    ...: for it in np.ndrange((100, 100)):
    ...:    ...:     pass
    ...:
143 μs ± 2.51 μs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

In [12]:
    ...: In [4]: %%timeit
    ...:    ...: for it in np.ndindex((100, 100)):
    ...:    ...:     pass
    ...:    ...:

1.15 ms ± 16.4 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

In [13]:

In [13]: np.__version__
'2.1.1'

In [14]: import sys

In [15]: sys.version
'3.10.14 | packaged by conda-forge | (main, Mar 20 2024, 12:45:18) [GCC 12.3.0]'

@hmaarrfk hmaarrfk force-pushed the ndrange branch 2 times, most recently from 23ffe4c to 69f83af Compare September 4, 2024 21:21

Notes
-----
.. versionadded:: 2.2.0
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this correct??

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, assuming we review and merge before the 2.2 release :)

@hmaarrfk hmaarrfk force-pushed the ndrange branch 2 times, most recently from f4e868c to 49055b5 Compare September 4, 2024 22:04
@hmaarrfk
Copy link
Contributor Author
hmaarrfk commented Sep 4, 2024

And green

@mattip
Copy link
Member
mattip commented Sep 5, 2024

@ahaldane @shoyer @eric-wieser as people involved in the original mailing list thread, any thoughts?

@mattip mattip added triage review Issue/PR to be discussed at the next triage meeting and removed 52 - Inactive Pending author response labels Sep 5, 2024
@shoyer
Copy link
Member
shoyer commented Sep 5, 2024

In the many years since that mailing list discussion, I've gotten a bit more conservative in my taste for API design.

It is still not clear to me who would use this feature. It feels like a solution that exists for the sake of "elegance" or "symmetry," not actual user needs.

If the intended users are library authors doing indexing manipulations for arrays, then @asmeurer's ndindex is a much more complete solution.

@hmaarrfk
Copy link
Contributor Author
hmaarrfk commented Sep 5, 2024

There are two places where we use it:

  1. to iterate faster than ndindex. When you add a time dimensions to your problem, you can start to have one of the dimensions be in the 1000s quite easily.
  2. Ndindex feels just feature incomplete ( no skipping ). Due to this it makes it hard to use generically.

I understand if this feels niche. I can try to see how I can take the concepts of here and to make PRs to improve the performance of ndindex. Historically python 2.7 made it difficult. I couldn’t figure out how to allow for slicing and the speed up’s without breaking functionality, so I proposed ndrange.

we use it to iterate over specific dimensions of our xarray all the time.

@ahaldane
Copy link
Member
ahaldane commented Sep 5, 2024

Looking at the old discussion, this again seems like a hard PR to decide what to do with. I think a more active maintainer should make a decision BDFL-style.

I'd be 60/40 in favor of accepting. The downside is the clutter of having two functions for largely the same thing, kind of like np.ones and np.ones_like. As in that case, ndrange here would be the updated way of doing what ndindex does now, with some more flexibility. The upside is that maybe a couple people besides @hmaarrfk will find it useful. Just like many of the other obscure functions in numpy/lib, it's hard to judge the cost/benefit since since one should aim to use vectorized operations instead of ndindex anyway.

@shoyer pointed to the quantsight nditer project, which is new since this PR was opened. There's a lot of overlap, but while that project looks great it doesn't seem to have exactly the performance or convenience of an "nd version of python3's range" like ndindex or ndrange here.

@hmaarrfk
Copy link
Contributor Author
hmaarrfk commented Sep 5, 2024

I’ll close this leaving it as a good idea for one user of jumpy.

If an other user thinks it is good and causes this discussion to be revived. Then great!

If nditer or an other project takes off. All the better.

Keeping this as an internal pure python function in our own codebase is also fine.

Thank you all for your time and review.

I’m much more excited about spending our collective on performance improvement ideas I have ;)

@hmaarrfk hmaarrfk closed this Sep 5, 2024
@ahaldane
Copy link
Member
ahaldane commented Sep 6, 2024

Fair enough. It's a nice idea on its own, too bad ndindex needs backcompat.

If you have enough interest, you might consider proposing it to the quantsight project, or just put it up on github as a standalone. That might show people using it enough to upstream here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
01 - Enhancement 25 - WIP 56 - Needs Release Note. Needs an entry in doc/release/upcoming_changes triage review Issue/PR to be discussed at the next triage meeting
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add indexing to ndindex
7 participants
0