8000 ENH: adds support for array construction from iterators by behzadnouri · Pull Request #5863 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

ENH: adds support for array construction from iterators #5863

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

behzadnouri
Copy link
Contributor
@behzadnouri behzadnouri commented May 10, 2015

on numpy master, iterators are not iterated in array construction:

>>> np.__version__
'1.10.0.dev0+30e3d41'
>>> np.array(range(5))
array([0, 1, 2, 3, 4])
>>> np.array(x for x in range(5))
array(<generator object <genexpr> at 0x7fed88898f30>, dtype=object)
>>> np.array(map(np.sqrt, [0, 1, 4]))
array(<map object at 0x7fed8616cb38>, dtype=object)
>>> np.array(map(np.sqrt, [0, 1, 4]), dtype='float')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: float() argument must be a string or a number, not 'map'

this is almost never what one would expect and can cause issues as in #5756, #5951, and gh-6555

with this patch:

>>> np.array(map(np.sqrt, [0, 1, 4]), dtype='float')
array([ 0.,  1.,  2.])
>>> np.array(x for x in [range(2), range(2, 4)])
array([[0, 1],
       [2, 3]])

@njsmith
Copy link
Member
njsmith commented May 10, 2015

Ehhhhh.... my inclination is to fix this by making the array-coercion logic
simpler (i.e., give an error on unrecognized input instead of returning
0-dim object arrays), rather than by making it more complex (trying to
automagically handle iterators -- and what about nested iterators etc.?).

@behzadnouri
Copy link
Contributor Author

and what about nested iterators etc.?

the very last example is a nested iterator.

EDIT: no it is not! I was wrong!

@njsmith
Copy link
Member
njsmith commented May 10, 2015

Yes, that's what I'm scared of :-)

@behzadnouri behzadnouri force-pushed the arr-ctor-iter branch 2 times, most recently from 8be100f to 1b1e8c2 Compare May 10, 2015 21:01
@behzadnouri behzadnouri changed the title ENH: adds support for array construction from non-sequence iterables ENH: adds support for array construction from iterators May 10, 2015
@behzadnouri behzadnouri force-pushed the arr-ctor-iter branch 3 times, most recently from b50d9a1 to d7df297 Compare May 17, 2015 00:34
@seberg
Copy link
Member
seberg commented Jun 8, 2015

If you have an idea of how to go forward, sure, but it doesn't seem so easy. Also your last example does not count because range is a sequence since it has a known length. But zip ignores this known length.
Frankly, I sometimes think the solution could be if python just fills in the length for zip when its input have a length (or is a true sequence, whatever that means). But maybe that would create another abundance of problems....

@behzadnouri
Copy link
Contributor Author

If you have an idea of how to go forward,

go forward w/ what? what is the problem?

Also your last example does not count because range is a sequence

that is true, and i was wrong. so the last example is not nested iterators, and this patch does not try to handle an iterator of iterators, but it does handle this kind of nesting:

>>> from itertools import imap, ifilter

>>> it = (x for x in xrange(5))
>>> pred = lambda x: x & 1
>>> op = lambda x: x * 3

>>> np.array(imap(op, ifilter(pred, it)))
array([3, 9])

@seberg
Copy link
Member
seberg commented Jun 8, 2015

Well, the one thing is always, are we sure that nobody's code may be broken by this? The other is here, whether a half fix is good. Then there is the question (I am not sure, it was extended), whether fromiter actually works generally.
This is a discussion for the mailing list probably.... I don't have much of an opinion to be honest, because I do see that the fix might also create problems.

@schlichtanders
Copy link

Dear all,

I also would appreciate seeing this branch being merged into master.

As far as I can think, the expected functionality of np.array(...) would be np.array(list(...)) or something even nicer.

As list does not handle nested iterators, this patch could be regarded as a full patch in this light.

def gen():
    yield gen()
    yield gen()

list(gen())
# [<generator object gen at 0x7f9471ae7d20>, <generator object gen at 0x7f9471ae7a00>]

Important: Further this would also handle all the rather unintuitive behaviours of np.mean, np.all, np.any, etc. which at the moment do not work (however there __builtin__ alternatives do well)

I myself use numpy a lot for interactive coding, however, especially with python3 in mind, now generators become the central object to pass around and work with. Having to write my own wrapper functions for np.array, np.mean, np.all, ... is such a big mess ... and the solution is so near: behzadnouri already made the pull request.

Considering whether this could break existing code, I cannot imagine any application where

np.array(gen())
# array(<generator object gen at 0x7f9471ae7910>, dtype=object)

would be useful. Can someone else? This of course would be a crucial point, but at least I think the change is still worth it, especially to make numpy more userfriendly for python3-style scientific interactive coding.

cheers

@njsmith
Copy link
Member
njsmith commented Nov 25, 2015

I sorta feel like if you are constantly getting generators over array data, then you are doing something suboptimal :-). What are you doing that you find yourself constantly wanting to call np.mean and np.all on generators? Generators are like for loops, but with even more overhead, so should be avoided in "numpythonic" code.

Also, there isn't really any reliable way to detect an arbitrary iterable in python, making it rather fraught to handle in a generic "do what I mean" interface like np.array.

So I'm not at all convinced that this is a good idea. But if you want to move it forward, the next step in any case would be to raise it for discussion on the mailing list. (Ideally with convincing arguments against the concerns raised in this thread :-).)

@schlichtanders
Copy link
schlichtanders commented Nov 25, 2015

thanks for the response!,

In general it seems given that it is possible to identify
iterators/generators as needed for this purpose:

  • someone actually implemented this feature already
  • there is ​type.GeneratorType​ and ​​collections.abc.Iterator​ for isinstance​(...) check
    ​- numpy can destinguish them already from all other types which get well
    translated into an numpy array​

Given this, I think the general argument goes roughly like the following:

PROS (effect maybe 10% of numpy user or more):

CONS (effect less than 0.1% numpy user I would guess):

  • might break existing code

which in total, at least for me at this stage, speaks in favour of merging
this patch or something similar into numpy master.
maybe I get convinced from the opposite.

In which mailinglist I would have to post this?
cheers

EDIT: P.S.: sorry that the code-thingies do not show up as usual - written by mail and github does not translate it accordingly

@njsmith
Copy link
Member
njsmith commented Nov 26, 2015

The mailing list is numpy-discussion, Google should find the details for
you (sorry, on my phone)

@eric-wieser
Copy link
Member

Also related to #14539

@joaoe
Copy link
Contributor
joaoe commented Feb 21, 2020

Howdy.
I reported this problem as issue.
My use case is using transformation or reduction functions, e.g. mean(iterator).
Many of numpy's functions coerce the input as a numpy array, which then produce weird results when the input is an iterator. This is also very confusing for people used to the standard functions like map, all, any, etc which can handle iterators and generators.
Thank you.

Base automatically changed from master to main March 4, 2021 02:03
@seberg
Copy link
Member
seberg commented Jul 8, 2021

I am going to close the PR since it is so old and the code has changed too much to salvage anything – besides the many open questions/discussion. Frankly, maybe we should rather reject iterators and improve (and in the error advertise) np.fromiter.

@seberg seberg closed this Jul 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants
0