ENH: Allow genfromtxt to unpack structured arrays#16650
ENH: Allow genfromtxt to unpack structured arrays#16650rossbar merged 13 commits intonumpy:masterfrom
Conversation
There was a problem hiding this comment.
LGTM, just a couple minor things. Thanks @AndrewEckart !
numpy/lib/npyio.py
Outdated
There was a problem hiding this comment.
Why the len(names) > 1 condition?
There was a problem hiding this comment.
If the goal is to match squeeze, then len(names) != 1 would work better for when names == [].
There was a problem hiding this comment.
I can change it, but I think names == [] necessarily implies output is empty (i.e. the input was an empty file).
Consider the following two calls:
np.genfromtxt(StringIO(''), dtype=None, unpack=True)
np.genfromtxt(StringIO(''), dtype={'names': (), 'formats': ()}, unpack=True)
With len(names > 1), both will return an empty ndarray , but with len(names) != 1), both will return an empty list.
numpy/lib/tests/test_io.py
Outdated
There was a problem hiding this comment.
Overall, LGTM! would be also useful to add a test with a mix of integer and string to make sure that the field array dtypes are being discerned correctly.
numpy/lib/tests/test_io.py
Outdated
There was a problem hiding this comment.
Here's what I get with this example (using StringIO instead of the internal TextIO):
In [37]: dt = np.dtype({'names': ('a',), 'formats': ('<i4',)})
In [38]: x = np.genfromtxt(StringIO('1'), dtype=dt, unpack=True)
In [39]: x
Out[39]: array((1,), dtype=[('a', '<i4')])
I don't think the dtype of the return value should still be the structured dtype in this case--unpack=True is supposed to unpack the fields. So I expect this output to be array(1, dtype=int32). In other words, the structure should still be pulled apart even when len(names) == 1.
There was a problem hiding this comment.
Good point. I modified the logic to return the single unpacked ndarray when len(names) == 1 and a list of ndarrays when len(names > 1), which I think is a little bit cleaner. Technically, len(names) == 0) is still possible with an empty input file and a structured dtype, but this already gives a UserWarning.
Previously, `genfromtxt` failed to transpose its output when `unpack=True` and `dtype` was structured (or `dtype=None` and the inferred `dtype` was structured). This patch resolves the issue by returning a list of arrays, as in `loadtxt`. Includes tests and updates the docstring to match `loadtxt`. Fixes numpy#4341
Co-authored-by: Eric Wieser <wieser.eric@gmail.com>
|
I think this is probably fine, although did not look too careful. Maybe @WarrenWeckesser can push this through? It seems that this does change behaviour fairly strongly? Probably the old behaviour can be defined as a clear bug here? @AndrewEckart could you comment briefly on that for me, and add a release note unless it was a loud error before? The release notes go into |
I'm not sure how to characterize it. In #4341, it's described as a bug, and the old behavior is contrasted with |
|
Definitely needs a release note IMO. Someones code will probably break somewhere. We have to make a call that it is very few people... |
Would you consider this |
|
Lets put it under compat, since the actual improvement is pretty straight forward (and arguably even a bug fix). |
| one for each column:: | ||
|
|
||
| >>> np.genfromtxt(StringIO("1 2.0 \n 3 4.0"), unpack=True) | ||
| [array([1, 3]), array([2., 4.])]] |
There was a problem hiding this comment.
I think this example requires a structured dtype= being passed in? It may be good to point/show an example which triggers the changed behaviour without passing in dtype= (if that is possible).
There was a problem hiding this comment.
(i.e. I tried the example on this branch, and the behaviour is not as written here – also there is an additional closing bracket)
There was a problem hiding this comment.
Sorry about that, it looks like the numbers in my example were too small so the inferred dtype was '<u4'. I replaced it with the test case for dtype=None.
There was a problem hiding this comment.
Maybe I have to take a closer look at the tests, or am I running the wrong code? But it seems to me that this gives:
In [2]: from io import StringIO
In [3]: np.genfromtxt(StringIO("21 58.0 \n 35 72.0"), unpack=True)
Out[3]:
array([[21., 35.],
[58., 72.]])
OTOH, this one changed behaviour as expected:
In [5]: np.genfromtxt(StringIO("21 58.0 \n 35 72.0"), unpack=True, dtype="i,d")
so, right now I think the example is still incorrect, and if there are tests for this, they should probably be refined to check that the result is (correctly, I assume) a single array in the first case, since it should be a single array when there is no structured dtype detected (which seems to be the default?).
Unless this should change and I am missing something?
There was a problem hiding this comment.
Nope, you were running the right code; it turns out that the example was incorrect. I forgot that detection of a structured dtype relies on passing dtype=None to override the default dtype=float (otherwise, a single array is the correct return value). My example in the release note failed to specify the dtype argument. Should be correct now.
|
Unfortunately CircleCI does not merge-from-master before building. Could you either merge from master or rebase to clear the non-related failure? |
There was a problem hiding this comment.
I took the liberty of expanding the release note a bit. There are a couple things that need addressing in the updated tests
Changes introduced after initial approval
There was a problem hiding this comment.
LGTM - thanks for sticking with this @AndrewEckart !
|
Hi, this change means that genfromtxt now ignores The example I am thinking of is, say we have a simple csv or space delimited document with 4 columns: val1 val2 valignore1 valignore2 I read this in using Now we get [array(val1, val3, val5), array(val1, val3, val5)], previously it would be an array of arrays with names (used to be called recarray I think but maybe that name has changed). |
Previously,
genfromtxtfailed to transpose its output whenunpack=Trueanddtypewas structured (ordtype=Noneand theinferred
dtypewas structured). This patch resolves the issue byreturning a list of arrays, as in
loadtxt. Includes tests and updatesthe docstring to match
loadtxt.Fixes #4341