-
-
Notifications
You must be signed in to change notification settings - Fork 11k
BUG: Use __array__
during dimension discovery
#14995
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
numpy/core/src/multiarray/ctors.c
Outdated
if (slen < 0) { | ||
/* Try __array__ before using s as a sequence */ | ||
PyObject *tmp = _array_from_array_like( | ||
s, /*dtype*/ NULL, /*writeable*/ 0, /*context*/ NULL); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Tempting to change this to match the other call exactly, either as _array_from_array_like(obj, NULL, NPY_FALSE, NULL);
here or this way there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, presonally I now use an IDE which actually adds hints for the names here, giving it double...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've seen a former coworker use that before, it looked pretty cool
Is a backport intended here? |
There was already a reversion in 1.17, but not in master. I had a reversion for the 1.18 release, but it was defective. This fixes the failed reversion with a new approach for 1.18. |
1e51a72
to
7ca95cd
Compare
Overall looks like some nice cleanup, especially if you can get it to pass CI |
Passing our CI should not be a problem. Pandas seems to be acting up a bit though, I have to investigate if its us or them (or just bad tests). |
@jbrockmendel, @superbobry maybe you can help me out understand pandas a bit better here? On the new branch pandas tests fail which do:
The code seems to call The part that I would like to understand is why it currently works (in some weird twisted way), when |
Yeah, but I think it should call |
These lines also look relevant: https://github.com/pandas-dev/pandas/blob/ea2e26ae7d700d7fd363ea5bfc05d2fe3fb8a5ee/pandas/core/dtypes/cast.py#L1381-L1385 Can you paste the traceback of the failure? |
Oh man, thanks yeah. I saw that function being used, but somehow did not expect it to be hit in one branch vs. the other... Error with this branch
In [1]: import pandas as pd
^[[A
In [2]: df1 = pd.DataFrame({"a": [1, 2, 3], "b": [3, 4, 5]})
In [3]: pd.DataFrame([df1, df1])
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-3-91cf2bad0f2e> in <module>()
----> 1 pd.DataFrame([df1, df1])
/home/sebastian/.local/lib/python3.7/site-packages/pandas/core/frame.py in __init__(self, data, index, columns, dtype, copy)
462 mgr = arrays_to_mgr(arrays, columns, index, columns, dtype=dtype)
463 else:
--> 464 mgr = init_ndarray(data, index, columns, dtype=dtype, copy=copy)
465 else:
466 mgr = init_dict({}, index, columns, dtype=dtype)
/home/sebastian/.local/lib/python3.7/site-packages/pandas/core/internals/construction.py in init_ndarray(values, index, columns, dtype, copy)
169 # by definition an array here
170 # the dtypes will be coerced to a single dtype
--> 171 values = prep_ndarray(values, copy=copy)
172
173 if dtype is not None:
/home/sebastian/.local/lib/python3.7/site-packages/pandas/core/internals/construction.py in prep_ndarray(values, copy)
293 values = values.reshape((values.shape[0], 1))
294 elif values.ndim != 2:
--> 295 raise ValueError("Must pass 2-d input")
296
297 ret
8000
urn values
ValueError: Must pass 2-d input Without this branch it returns a Dataframe with dataframes as entries, which is silly, but... Anyway, I will have a closer look if pandas can do something. But could be nice to have feedback from pandas whether they care about this or think its OK to error out... |
Can you run |
whoops, I think I misread the code at the very start, its EDIT: To be clear: try:
if is_list_like(values[0]) or hasattr(values[0], 'len'): # <-- is hit
# following convert does nothing; `np.array()` than raises Error...
values = np.array([convert(v) for v in values])
elif isinstance(values[0], np.ndarray) and values[0].ndim == 0:
# GH#21861
values = np.array([convert(v) for v in values])
else:
values = convert(values)
except (ValueError, TypeError):
values = convert(values) # <-- Ends up getting called and forces object array. |
Pushing this off. Will fix up the reversion which will also probably mess up the merge. |
Yeah, I was starting to figure the same. I can do the fixup if you prefer. |
I would like to try this early after the split then, so that we can see if pandas is OK/can keep up with it. |
@seberg, the way I read it |
@eric-wieser yeah, but the new path that is taken with this never tries calling I have opened pandas issue at pandas-dev/pandas#29978 |
7ca95cd
to
8a1a80f
Compare
@seberg ping |
497c36a
to
50612a5
Compare
OK, updated and release notes added. Maybe we should just try and see where it brings us with respect to pandas? I assume they are the largest "users" of this behaviour, and it seemed like all of the test failures they have is nonsense behaviour that they should be deprecating anyway. In the end, it is hard to test unless someone from the pandas side is willing to get this branch, so I would be for moving forward and reverting if necessary? |
50612a5
to
766b046
Compare
I'll pull this and run our npdev build against it now. |
766b046
to
d6ba4fb
Compare
Did some style fixups and comment expansion (sorry, I did not mean to, but my muscle reflexes force pushed). The refcounting was correct, since it actually is a borrowed reference until the end of that function (public API, cannot be changed, I added a comment, which may be a bit long but...). @jbrockmendel thanks for checking! Good to know that you think those pandas tests can be simply fixed up and should not be a problem. |
d6ba4fb
to
17cff3b
Compare
Release notes pushed (forgot to add it to the commit). |
17cff3b
to
9373cf3
Compare
``__array__`` was previously not used during dimension discovery, while bein gused during dtype discovery (if dtype is not given), as well as during filling of the resulting array. This would lead to inconsistencies with respect to array likes that have a shape including a 0 (typically as first dimension). Thus a shape of ``(0, 1, 1)`` would be found as ``(0,)`` because a nested list/sequence cannot represent empty shapes, except 1-D. This uses the `_array_from_array_like` function, which means that some coercions may be tiny bit slower, at the gain of removing a lot of complex code. (this also reverts commit d0d250a or the related change). This is a continuation of work by Sergei Lebedev in numpygh-13663 which had to be reverted due to problems with Pandas, and the general inconsistency. This version may not resolve all issues with pandas, but does resolve the inconsistency. Closes numpygh-13958
9373cf3
to
48dbe84
Compare
Fixed the merge conflict, and tweaked the release note a bit. Do we have to discuss whether this should go in as a bug fix. If this is changes behaviour for some (arguably buggy code), I feel that in almost all cases it should lead to quick crashes, or at least very obvious wrong results. |
Thanks @seberg |
__array__
was previously not used during dimension discovery,while bein gused during dtype discovery (if dtype is not given),
as well as during filling of the resulting array.
This would lead to inconsistencies with respect to array likes
that have a shape including a 0 (typically as first dimension).
Thus a shape of
(0, 1, 1)
would be found as(0,)
becausea nested list/sequence cannot represent empty shapes, except 1-D.
This uses the
_array_from_array_like
function, which means thatsome coercions may be tiny bit slower, at the gain of removing
a lot of complex code.
(this also reverts commit d0d250a)
Closes gh-13958
I am a bit confused as to why I thought before that just adding
__array__
would not simply solve this. I think it does. I could add the use of_array_from_array_like
incommon.h
during dtype discovery.Note that this changes behaviour, so testing against pandas may be in order. And I suppose it needs a release note, I will probably add those tonight.