8000 NEP: add np.allow_object to the proposal by mattip · Pull Request #15083 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

NEP: add np.allow_object to the proposal #15083

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from
Closed
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
NEP: rename sentinal to 'allow_object'
  • Loading branch information
mattip committed Dec 15, 2019
commit 08d05c1c3a73f0a88528e70c20ee7eab85bd3f57
23 changes: 12 additions & 11 deletions doc/neps/nep-0034.rst
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ level have the same length) will force users who actually wish to create
and ``nd.ndarrays`` are all sequences [0]_. See for instance `issue 5303`_.
Copy link
Member
@eric-wieser eric-wieser Dec 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
and ``nd.ndarrays`` are all sequences [0]_. See for instance `issue 5303`_.
and ``np.ndarrays`` are all sequences [0]_. See for instance `issue 5303`_.

Since this paradigm seems to be in wide use in libraries like pandas, scipy,
and matplotlib, a work-around via ``np.array([<ragged_nested_sequence>],
dtype=np.allow_ragged)`` will be added, with the intent that the work-around
dtype=np.allow_object)`` will be added, with the intent that the work-around
will be removed at some time in the future.

Usage and Impact
Expand Down Expand Up @@ -87,15 +87,15 @@ Implementation
--------------

The code to be changed is inside ``PyArray_GetArrayParamsFromObject`` and the
internal ``discover_dimentions`` function. See `PR 14794`_.
internal ``discover_dimensions`` function. S 8000 ee `PR 14794`_.

Backward compatibility
----------------------

Anyone depending on creating object arrays from ragged nested sequences will
need to modify their code. There will be a deprecation period during which the
current behaviour will emit a ``DeprecationWarning``. Additionally, a sentinal
value ``np.allow_ragged`` will use the pre-NEP-34 behaviour without a warning.
value ``np.allow_object`` will use the pre-NEP-34 behaviour without a warning.
Once the implications of the change are well-understood, the sentinal value
could be removed.

Expand Down Expand Up @@ -135,14 +135,15 @@ Comments to `issue 5303`_ indicate this is unintended behaviour as far back as
2014. Suggestions to change it have been made in the ensuing years, but none
have stuck. The implementation in `PR 14794`_ seemed to point to the
viability of this approach. However this proved very disruptive to downstream
library authors, so the idea to add a ``np.allow_ragged`` sentinal was added.

The name is specific to ragged nested sequences. In discussion we considered a
more general ``np.allow_object``. That is over-general and would not
differentiate between ``np.array([Decimal(10), Decimal(10)])`` and the ragged
nested sequence ``np.array([Decimal(10), [1, Decimal(10)]])``. Currently the
NEP does not propose disallowing the latter without an explicit
``np.allow_ragged``, but a future enhancement could.
library authors, so the idea to add a ``np.allow_object`` sentinal was added.

The name is not specific to ragged nested sequences. In discussion we
considered a more specific ``np.allow_ragged``. However it was felt that there
is really no difference between an object array with ragged sequences and an
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion there is a difference, it's that the former is ambiguous while the latter is not - today there is no reason you would pass allow_object unless you care about ragged sequences, so it seems like the name should clearly indicate that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty sure that's not true, there were lots of examples given in the call about other things that could be used inside object arrays (e.g. arrays of Decimal or Fraction was @charris's example).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, there are lots of examples of object arrays, as far as I understood, the reasoning is:

  1. Try to have as few new names in the API as possible
  2. We may deprecate automatic conversion to objects in general (or at least in more generally
  3. It is likely OK, if (once we start deprecating more) allow_object will cover ragged-sequences as well (meaning we likely do not need an "allow objects but not ragged-sequences")

With those considerations (at least for me) I can see that allow_object as a single catch all "legacy object" behaviour may be desirable. I will admit I am still a bit unsure if a single object is sufficient granularity, and that an "OptIn" for getting an error right now is not useful. But if pandas will not have a use-case soon, my guess is it does not matter.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there were lots of examples given in the call about other things that could be used inside object arrays.

Right, but this NEP doesn't give any justification for why we would want to deprecate np.array([Decimal(1), Decimal(2)]) in future, yet chooses a spelling that assumes we would want to do so. If we're going to pick that spelling, we should at least speculate some other situations that would require it to be passed within the NEP. Currently, I can't think of a convincing use case other than nested sequences.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is one clear justification: I think we may want to deprecate it, because people should be using np.array([Decimal(2), Decimal(2)], dtype=np.PyObject[Decimal]), i.e. making type specific object dtypes should be easier and should be preferred in many cases. (of course the result of operations on these, unless defined by the user could spill to a generic object quickly, so it may not be super straight forward always).

Copy link
Member
@eric-wieser eric-wieser Dec 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i.e. making type specific object dtypes should be easier and should be preferred in many cases.

Given that np.array([1, 1]) infers to np.int_, wouldn't inferring np.array([Decimal(1), Decimal(1)]) to decimal to np.PyObject[Decimal] also be reasonable?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eric-wieser yes, but I think someone may be expected to define that dtype first probably. Since you could also imagine someone creating a non-object version of Decimal which is much faster. (There is always the issue if two people define dtypes for the same python type, breaking the 1:1 mapping of python type to dtype, we could error/warn when that happens, I am not sure it is a huge deal. Or we cannot allow the above pattern...).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we may want to deprecate it, because people should be using ...

I don't think it makes sense to me to throw the people who can't yet be bothered to define their own extension type into the same bucket as people who are deliberately using ragged arrays.

By picking a generic name here that could apply to multiple deprecations (of which we only have remotely concrete plans for the first), either:

  • We never create a second deprecation that makes use of this sentinel, and for its entire lifetime allow_object is a less precise way to spell allow_ragged
  • We create a second and maybe third deprecation. We can't finish the first one until we finish all the later ones, as we didn't introduce a new spelling for each.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I was drifting of a bit too far. That deprecation is not concrete yet. Even if it will happen eventually, it would be at a time where this one is at least close to being done probably.

Well, there is at least one concrete one, we could do soon, the promotion of python integers to object (or unsigned/larger integer). I will be frank, I am not quite convinced myself that saving one sentinel here is worth the trouble. For example, I could rather imagine hiding the sentinel somewhere, where it is less spammy (I don't know, np.dtype.allow_ragged for all I care).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And to be very clear: I am also not convinced that it is so bad to call it allow_object, that I am willing to argue seriously against it :).

object array like ``np.array([Decimal(10), Decimal(10)])``. There was some
discussion of the possibility of a future NEP allowing syntax like
``np.array([Decimal(10), Decimal(10)], dtype=Decimal)``, but again that is out
of scope for the current NEP.

References and Footnotes
------------------------
Expand Down
0