-
-
Notifications
You must be signed in to change notification settings - Fork 10.9k
NEP: Add zero-rank arrays historical info NEP #12166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
:Author: Alexander Belopolsky (sasha), transcribed Matt Picus <matti.picus@gmail.com> | ||
:Status: Draft | ||
:Type: Informational | ||
:Created: 2018-10-14 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we want to backdate this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, let's put the original date here -- maybe noting the date it was transcribed, too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
||
Sasha started a `Jan 2006 discussion`_ on scipy-dev | ||
with the folowing proposal: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The r 8000 eason will be displayed to describe this comment to others. Learn more.
Typo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
|
||
.. _`2006 wiki entry`: https://web.archive.org/web/20100503065506/http://projects.scipy.org:80/numpy/wiki/ZeroRankArray | ||
.. _`history`: https://web.archive.org/web/20100503065506/http://projects.scipy.org:80/numpy/wiki/ZeroRankArray?action=history | ||
.. _`2005 mailing list thread`: https://sourceforge.net/p/numpy/mailman/message/11299166 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Presumably this must have a piper-mail url too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Strangely enough I could not find it. Seems to be lost in the transition?
Nice archaeology recovering most of the other links! |
Wow, very interesting. Thanks for the revival! This helps a lot for thinking about the arrayprint code we were working on in the last year, since the 0d vs scalar distinction is very important there. |
Interesting, though I must admit the text leaves me still puzzled about why 0-d arrays cannot be used also in place of scalars, i.e., why we need the scalar types at all. |
@mhvk: Agreed. I think the argument about As an aside - the way we make the scalars be subclasses of python scalars is forbidden by the CPython docs (#11998), so if we can, I'd prefer to drop these base classes. |
The email https://web.archive.org/web/20100501162447/http://aspn.activestate.com:80/ASPN/Mail/Message/numpy-discussion/3028210 linked in #12164 (comment), which is all about indexing vs projection, did make it clearer why one logically could have two types. Am still not sure why we couldn't get rid of numpy scalars, though... Is there more than being immutable? [Edited to correct links] |
That email link doesn't work for me. Not sure how important it is, but another case that makes eliminating scalars difficult is object arrays. If the user does |
@ahaldane - yes, object arrays do stand out again. I guess one would either need to special-case object arrays (not crazy; suddenly the Anyway, not specifically arguing that it should change, just noting that the text to me doesn't provide a particularly strong rationale for having both - the e-mail (now correctly linked) was clearer. |
Hehe, derailing discussion, I like it ;) (sorry, needed to distract myself for a few minutes). I do believe that immutability and in-place behaviour are good enough reasons! I also believe that "there should be only one obvious way" is a red herring. Of course a future numpy could remove all scalars, I frankly believe that scalars have a lot of uses. Often they should even have different semantics ( Now why I think it is a red herring: I do think that most of the time if you want something 0D, a scalar is what you want. Assuming you do not argue against the fact that scalars are typically a bit nicer. Having 0D arrays is as first class citizens is not an issue, for the simple reason that they don't randomly appear. So there is one obvious way. Most of the time it is the scalar and sometimes when you need for example mutability the array well be the obvious solution and a blessing. One thing I disagree is that scalars need indexing. I believe the only reason they do need it is because 0D arrays are not first class, and I challenge anyone to give me an example where fuzzing out the distinction is helpful ;). On the other hand, maybe I just like consistency too much :). |
|
||
Indexing of Zero-Rank Arrays | ||
---------------------------- | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a note that all these indexing operations have been implemented?
In [2]: x = np.array(1)
In [3]: x[...]
Out[3]: array(1)
In [4]: x[np.newaxis]
Out[4]: array([1])
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if we want to modify this to reflect the current state of things - I think PEPs tend to reflect the time of their writing, and not how things ended up being implemented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe just change the date and then add a note at the top briefly explaining the state of things? It's just a little weird to see a NEP dated today that explains outdated behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added a note to the abstract
:Author: Alexander Belopolsky (sasha), transcribed Matt Picus <matti.picus@gmail.com> | ||
:Status: Draft | ||
:Type: Informational | ||
:Created: 2018-10-14 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, let's put the original date here -- maybe noting the date it was transcribed, too?
@@ -0,0 +1,238 @@ | |||
========================= | |||
NEP 16 — Zero Rank Arrays |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's save NEP 16 for #10706
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, so this becomes NEP 27.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thanks!
See SVN changeset 1864 (which became git changeset `9024ff0`_) for | ||
implementation of ``x[...]`` and ``x[()]`` returning numpy scalars. | ||
|
||
See SVN changeset 1866 (which became git changeset `743d922`_) for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A nit: commit
is more natural a word for git - I think changeset
is a trac / SVN term
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
NEP 27 — Zero Rank Arrays | ||
========================= | ||
|
||
:Author: Alexander Belopolsky (sasha), transcribed Matt Picus <matti.picus@gmail.com> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @abalkin
@seberg - thanks for the insight! I was convinced, at least for a moment, until I remembered again the great pain any subclass has to go through in ensuring that it can also provide scalars (the |
represent scalar quantities in all case. Pros and cons of converting rank-0 | ||
arrays to scalars were summarized as follows: | ||
|
||
- Pros: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't really suggest changing this doc, but I think this section in particular is out of date.
The main argument I've heard for scalar types is speed -- they are significantly faster than working with 0-d arrays.
The first and third "Pros" here are no longer true. With Python 3, Python uses operator.index()
for coercing integers and NumPy scalars can't be relied upon to subclass Python types (e.g., isinstance(np.int64(1), int)
-> False
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find it unlikely that scalars are faster, given that most of our operations start by casting them to 0d arrays. Perhaps the arithmetic has a fast path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so is the conclusion is we should get rid of scalars once we move to python 3?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't agree with that conclusion - I also think that python 2.7 already uses __index__
, so if there's a cut-off line here we've already crossed it.
One thing I would like to see is a merge of the scalar types and dtypes - so that isinstance(np.dtype, type)
is true, and isinstance(np.float64, np.dtype)
is also true. But that's blocked by #11998 right now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Scalars absolutely have a fast path. It is in numpy/numpy/core/src/umath/scalarmath.c.src among other places.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eric-wieser - Scalars are indeed fast-tracked for arithmetic (scalarmath.c.src
):
In [3]: a = np.array(1.)
In [4]: %timeit a * a
1000000 loops, best of 3: 541 ns per loop
In [5]: a = np.float64(1.)
In [6]: %timeit a * a
10000000 loops, best of 3: 91.6 ns per loop
Looks good to merge to me now |
Yes, +1 from me as well
…On Mon, Oct 15, 2018 at 9:00 AM Eric Wieser ***@***.***> wrote:
Looks good to merge to me now
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#12166 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ABKS1lTUoKmgf7L77cvhgo4qcL2K1eqiks5ulLEsgaJpZM4XbE-I>
.
|
I would love to get rid of scalars in favor of 0d arrays, but that should
certainly require another NEP :). There are some backwards compatibility
concerns to consider.
…On Mon, Oct 15, 2018 at 9:05 AM Matti Picus ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In doc/neps/nep-0027-zero-rank-arrarys.rst
<#12166 (comment)>:
> +native python types goes against one of the fundamental python design
+principles that there should be only one obvious way to do it. In this section
+we will try to explain why it is necessary to have three different ways to
+represent a number.
+
+There were several numpy-discussion threads:
+
+
+* `rank-0 arrays`_ in a 2002 mailing list thread.
+* Thoughts about zero dimensional arrays vs Python scalars in a `2005 mailing list thread`_]
+
+It has been suggested several times that NumPy just use rank-0 arrays to
+represent scalar quantities in all case. Pros and cons of converting rank-0
+arrays to scalars were summarized as follows:
+
+- Pros:
so is the conclusion is we should get rid of scalars once we move to
python 3?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#12166 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ABKS1gh9zdXfcMZxN1buXd_EU5BnUiUaks5ulLJGgaJpZM4XbE-I>
.
|
I'll mention that one other reason to get rid of scalars is that they make
it quite challenging to type annotate NumPy code. We'd really love to be
able to say that np.add() returns a numpy array when passed two numpy
arrays, but that's only true if the result is non-scalar.
…On Mon, Oct 15, 2018 at 9:12 AM Stephan Hoyer ***@***.***> wrote:
I would love to get rid of scalars in favor of 0d arrays, but that should
certainly require another NEP :). There are some backwards compatibility
concerns to consider.
On Mon, Oct 15, 2018 at 9:05 AM Matti Picus ***@***.***>
wrote:
> ***@***.**** commented on this pull request.
> ------------------------------
>
> In doc/neps/nep-0027-zero-rank-arrarys.rst
> <#12166 (comment)>:
>
> > +native python types goes against one of the fundamental python design
> +principles that there should be only one obvious way to do it. In this section
> +we will try to explain why it is necessary to have three different ways to
> +represent a number.
> +
> +There were several numpy-discussion threads:
> +
> +
> +* `rank-0 arrays`_ in a 2002 mailing list thread.
> +* Thoughts about zero dimensional arrays vs Python scalars in a `2005 mailing list thread`_]
> +
> +It has been suggested several times that NumPy just use rank-0 arrays to
> +represent scalar quantities in all case. Pros and cons of converting rank-0
> +arrays to scalars were summarized as follows:
> +
> +- Pros:
>
> so is the conclusion is we should get rid of scalars once we move to
> python 3?
>
> —
> You are receiving this because you commented.
> Reply to this email directly, view it on GitHub
> <#12166 (comment)>, or mute
> the thread
> <https://github.com/notifications/unsubscribe-auth/ABKS1gh9zdXfcMZxN1buXd_EU5BnUiUaks5ulLJGgaJpZM4XbE-I>
> .
>
|
I don't think we need to get rid of scalars for that - we just need to stop producing them for anything but indexing |
Indeed, I am convinced that most/all of the annoyances about scalars have nothing to do with scalars, but just with the way that |
There are other operations that legitimately convert arrays into scalars, such as reductions. I suspect users would be confused if indexing and reductions (e.g., In practice, I suspect the main objection to using 0d arrays extensively would be that users start seeing |
@shoyer I believe reductions are basically a null argument. Becuase |
I think the main problem is speed, in particular, ufunc call overhead. That is why |
Of course we could change this, and make 0d arrays print like scalars. The arrayprint code already special-cases 0d arrays and prints them using the scalar-print path (different from the array-print code-path) which uses higher precision: >>> str(np.array(np.pi))
'3.141592653589793'
>>> str(np.array([np.pi]))
'[3.14159265]' |
Yep, this would be a reasonable compromise. If we did this and kept around scalars as array constructors (e.g.,
This has always felt like a strange optimization to me. If you want maximum speed with pure Python code, you are better off using Python's built scalars (e.g., 2x faster for multiplication and 10x (!) faster for This doesn't leave a very big niche for NumPy scalars -- only cases where you want the exact dtype semantics of numpy or where you want to use one of the rare ufuncs without an equivalent in the standard library's |
I will just note again that I am -1 on even hoping or planning to rid of scalars (except from an implementation point of view), I have serious trouble seeing the point. I think scalars are the "expected" thing most of the time (mutability, hashability). 0-D arrays should not be promoted more, rather they should exist as a rarely used niche that most users will not run into because they don't need it and they won't typically create them accidentally. One that is still useful for those who happen to need it. EDIT: Of course for many array "dtypes" the associated scalar could in a sense be a python integer or float, but not sure that helps much. Also, of course it is not like I am sure, but I feel the arguments for no scalars are not quite along the line of what would be the best end-point but more on what the current problems appear to be. |
IIRC, @rkern did the original implementation on account of complaints about speed.
I believe that @teoliphant suggested adding a dictionary to ndarray at some point. |
Putting this in, since there seem to be no further comments on the state of the NEP contents. |
@eric-wieser - thanks for mentioning me in this thread. I agree with @seberg that scalars are necessary because due to hashability they can be used in places where 0-dims arrays cannot. This is covered in the NEP. One way to reduce the number of types that do the same thing slightly differently would be to try to sneak in numpy scalars to the python core library under the guise of ctypes scalars. The ctypes module is showing its age and I think could use the expertise of the numpy community. After rereading the NEP, I don't have any corrections other than maybe replacing "Sasha" with my full name in a few places. :-) |
Thanks @abalkin for creating this document in the first place. Since this is still a draft, I will issue a new PR to make this accepted and do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One change that's outdated.
array(20) | ||
|
||
Indexing of Zero-Rank Arrays | ||
---------------------------- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest that this section be removed entirely or updated. For example, if x
is either an array scalar or a rank zero array, x[...]
is guaranteed to be an array and x[()]
is guaranteed to be a scalar. The difference is because x[{anything here}, ...]
is guaranteed to be an array. In words, if the last index is an ellipsis, the result of indexing is guaranteed to be an array.
I came across this weird behaviour when implementing the equivalent of np.where
for PyData/Sparse.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should remove sections from past documents because they no longer apply. NEPs document the state of things when they were written, not the state of master
. Perhaps a ..note:
could go here
Worth keeping at least one mention of |
You're probably right from a backward compatibility point of view, but I don't agree here. It's not a huge burden to need to convert from arrays to built-in scalars, and users would quickly learn they need to call The way that NumPy intentionally conflates NumPy/builtin scalars (by giving them the same repr) is a recurrent source of confusion that in my experience has lead to lots of bugs. |
@shoyer, yes that is the other option that may be clean. The question is if indexing should always return an array again, which breaks 1d compatibility with lists and will be confusing as well. Object is maybe a bit tricky, but the only real issue I see is that it might be hard to know if something is a scalar or not right now. Btw. I disagree a bit that there might no movement possible here. If – I guess a big if – we make progress on new dtype support we have to get it right and I do believe we will have some wiggle room. We could create a |
If I may, from creating code that works with any number of dimensions perspective, it's very nice to have 0D arrays. Many a time I've had issues with the fact that something returns one result if it's 0-D and another when it's of a higher dimension. I have no issues with scalars so long as they behave the same as arrays in any and all ways, that is, |
@hameerabbasi - and you probably would like them to be mutable too... it sounds like you want "0-D arrays", not scalars at all, like @shoyer and me. It may be that the three of us are coming from a perspective where it is really useful for Anyway, not obvious what the path forward is here... |
@mhvk right, personally I think things like @hameerabbasi, me defending scalars has nothing to do with that issue. You can easily get consistency so that 0D behaves exactly like ND. For example |
Consistency is mostly what I care about. If we go to Python scalars we can't get this consistency.
Well, there are things you can't do, I agree. But, this thing isn't one of them. |
@hameerabbasi sorry, tricky example. That one works now, but it used to be that if I agree that Py scalars will probably be inconsistent math wise, etc. But I really don't see the point of having container methods on our scalars. If anyone got an example that is not just created by numpy converting a 0D array silently, I would be interested! |
For some historical perspective. 0-d arrays were not well accepted by Numeric and NumPy inherited this early on. Over time, 0-d arrays gained favor to the point where today it seems odd that we don't fully embrace them. Also, this is an example of "user-APIs" vs "developer APIs". 0-d arrays are perfectly fine and desired for "developers" but end up creating "user-issues" that you have to carefully squash (printing, use in indexing, immutability for use as keys, ...). A developer can always get an actual scalar using .item() or [()], but a data-scientist user appreciates the convenience which is messy. At the same time, dtypes have several challenges one of which is that they are a Python 1.x-style type concept where every type is an instance of a single Python type. Instead dtypes should be Python types specifically. Array scalars exist and have the same API as arrays because of these two architectural problems. A NumPy 2.x should definitely remove array scalars. This can be done by embracing 0-d arrays and also building dtypes as actual types. |
+1 on this. I'm not sure we necessarily need a numpy 2.0 for this change - if |
Yes, we definitely need NumPy 2 for this. To do this right, it will require a breaking change that will require re-compilation of extensions and some deprecation of APIs. There are implications on the C-structure level that will change the ABI. There are implications on the API level that will be too much work to try and figure out how to force them into a 1.x series --- if someone wants to back-port some of the changes to 1.x that could be done after the 2.x release. I don't see how what I'm thinking about is related to #11998 which is an example of Python back-tracking --- it used to support multiple-inheritance on the C-level. |
If we want dtypes to be types, then we presumably want to end up with |
You're almost certainly right there - it seems pretty likely that |
Note this PR has been merged. We (the BIDS team) have been giving the dtype overhaul some thought. We have a working document with a proposal to make dtypes Another venue for discussion is our weekly status sessions at noon Pacific time on Wednesdays, as published on the mailing list. |
The NEP is listed with "Status: Draft", which means it appears under "Open NEPs" in the NEP index. Perhaps we should switch the status to either "Deferred" or "Final"? I think I would be fine with either -- obviously if we want to change the behavior of scalars/0-rank arrays in NumPy today we would need a new proposal. (FWIW, I agree mostly with @teoliphant.) |
I'm not sure what the process is for informational NEPS. Do they need the 1 week notice to the mailing list? |
Yes, in principle -- but this is also already out of date / describing a
rationale for decisions made a decade ago. So I'm not sure there's really
any point in that.
…On Sat, Oct 27, 2018 at 12:15 PM Matti Picus ***@***.***> wrote:
I'm not sure what the process is for informational NEPS. Do they need the
1 week notice to the mailing list?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#12166 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ABKS1nbJrgCWXW0GNpq0dxPQa3TXVecpks5upLDIgaJpZM4XbE-I>
.
|
Fixes part of #12164. I reworked the formatting of the document, added links to mailing list discussions where I could and removed references to changesets implementing
[...]
and[()]
indexing that I could not find. The original wiki document refers to "Multidimensional Arrays for Python" by Travis Oliphant, draft 02-Feb-2005. Did this later become the Guide to NumPy? Some of the email discussions on sourceforge refer to a PEP that apparently is not PEP 209 since the text does not match.