-
-
Notifications
You must be signed in to change notification settings - Fork 11.1k
NEP: add np.allow_object to the proposal #15083
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -30,6 +30,10 @@ recursive sequence of sequences, where not all the sequences on the same | |
level have the same length) will force users who actually wish to create | ||
``object`` arrays to specify that explicitly. Note that ``lists``, ``tuples``, | ||
and ``nd.ndarrays`` are all sequences [0]_. See for instance `issue 5303`_. | ||
Since this paradigm seems to be in wide use in libraries like pandas, scipy, | ||
and matplotlib, a work-around via ``np.array([<ragged_nested_sequence>], | ||
dtype=np.allow_object)`` will be added, with the intent that the work-around | ||
will be removed at some time in the future. | ||
|
||
Usage and Impact | ||
---------------- | ||
|
@@ -71,24 +75,29 @@ all of these will be rejected: | |
Related Work | ||
------------ | ||
|
||
`PR 14341`_ tried to raise an error when ragged nested sequences were specified | ||
with a numeric dtype ``np.array, [[1], [2, 3]], dtype=int)`` but failed due to | ||
false-positives, for instance ``np.array([1, np.array([5])], dtype=int)``. | ||
`PR 14341`_ tried to change the error reported when ragged nested sequences | ||
were specified with a numeric dtype ``np.array([[1], [2, 3]], dtype=int)`` but | ||
failed due to false-positives when assigning a mixed scalar-ndarray list to an | ||
array (`issue 14138`_) | ||
|
||
.. _`PR 14341`: https://github.com/numpy/numpy/pull/14341 | ||
.. _`issue 14138`: https://github.com/numpy/numpy/issue/14138 | ||
|
||
Implementation | ||
-------------- | ||
|
||
The code to be changed is inside ``PyArray_GetArrayParamsFromObject`` and the | ||
internal ``discover_dimentions`` function. See `PR 14794`_. | ||
internal ``discover_dimensions`` function. See `PR 14794`_. | ||
|
||
Backward compatibility | ||
---------------------- | ||
|
||
Anyone depending on creating object arrays from ragged nested sequences will | ||
need to modify their code. There will be a deprecation period during which the | ||
current behaviour will emit a ``DeprecationWarning``. | ||
current behaviour will emit a ``DeprecationWarning``. Additionally, a sentinal | ||
value ``np.allow_object`` will use the pre-NEP-34 behaviour without a warning. | ||
Once the implications of the change are well-understood, the sentinal value | ||
could be removed. | ||
|
||
Alternatives | ||
------------ | ||
|
@@ -113,18 +122,34 @@ Alternatives | |
the current NEP. Rationale: it's harder to asses the impact of this larger | ||
change, we're not sure how many users this may impact. | ||
|
||
- The issue of ``np.array([0.3, [0.3]])`` (or more commonly ``np.array([0.3, | ||
np.array([0.3])])`` creating an object array rather than a ``float64`` array | ||
was discussed in `issue 15075`_. This too is out of scope for the current | ||
NEP. | ||
|
||
|
||
Discussion | ||
---------- | ||
|
||
Comments to `issue 5303`_ indicate this is unintended behaviour as far back as | ||
2014. Suggestions to change it have been made in the ensuing years, but none | ||
have stuck. The WIP implementation in `PR 14794`_ seems to point to the | ||
viability of this approach. | ||
have stuck. The implementation in `PR 14794`_ seemed to point to the | ||
viability of this approach. However this proved very disruptive to downstream | ||
library authors, so the idea to add a ``np.allow_object`` sentinal was added. | ||
|
||
The name is not specific to ragged nested sequences. In discussion we | ||
considered a more specific ``np.allow_ragged``. However it was felt that there | ||
is really no difference between an object array with ragged sequences and an | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In my opinion there is a difference, it's that the former is ambiguous while the latter is not - today there is no reason you would pass allow_object unless you care about ragged sequences, so it seems like the name should clearly indicate that. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Pretty sure that's not true, there were lots of examples given in the call about other things that could be used inside object arrays (e.g. arrays of There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Well, there are lots of examples of object arrays, as far as I understood, the reasoning is:
With those considerations (at least for me) I can see that There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Right, but this NEP doesn't give any justification for why we would want to deprecate There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There is one clear justification: I think we may want to deprecate it, because people should be using There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Given that There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @eric-wieser yes, but I think someone may be expected to define that dtype first probably. Since you could also imagine someone creating a non-object version of There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I don't think it makes sense to me to throw the people who can't yet be bothered to define their own extension type into the same bucket as people who are deliberately using ragged arrays. By picking a generic name here that could apply to multiple deprecations (of which we only have remotely concrete plans for the first), either:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, I was drifting of a bit too far. That deprecation is not concrete yet. Even if it will happen eventually, it would be at a time where this one is at least close to being done probably. Well, there is at least one concrete one, we could do soon, the promotion of python integers to object (or unsigned/larger integer). I will be frank, I am not quite convinced myself that saving one sentinel here is worth the trouble. For example, I could rather imagine hiding the sentinel somewhere, where it is less spammy (I don't know, There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. And to be very clear: I am also not convinced that it is so bad to call it |
||
object array like ``np.array([Decimal(10), Decimal(10)])``. There was some | ||
discussion of the possibility of a future NEP allowing syntax like | ||
``np.array([Decimal(10), Decimal(10)], dtype=Decimal)``, but again that is out | ||
of scope for the current NEP. | ||
|
||
References and Footnotes | ||
------------------------ | ||
|
||
.. _`issue 5303`: https://github.com/numpy/numpy/issues/5303 | ||
.. _`issue 15075`: https://github.com/numpy/numpy/issues/15075 | ||
.. _sequences: https://docs.python.org/3.7/glossary.html#term-sequence | ||
.. _`PR 14794`: https://github.com/numpy/numpy/pull/14794 | ||
.. _`another library`: https://github.com/scikit-hep/awkward-array | ||
|
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.