8000 ndarray should offer __format__ that can adjust precision · Issue #5543 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

ndarray should offer __format__ that can adjust precision #5543

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
brandon-rhodes opened this issue Feb 8, 2015 · 21 comments
Open

ndarray should offer __format__ that can adjust precision #5543

brandon-rhodes opened this issue Feb 8, 2015 · 21 comments

Comments

@brandon-rhodes
Copy link

In many wonderful cases an ndarray can be used in place of a Python float and Just Work.

But not in one case:

import numpy as np

n = 1.23
print('{0:.6} AU'.format(n))

n = np.array([1.23, 4.56])
print('{0:.6} AU'.format(n))

The output of the above code, at least under Python 3.4, is:

1.23 AU
Traceback (most recent call last):
  File "tmp9.py", line 7, in <module>
    print('{0:.6} AU'.format(n))
TypeError: non-empty format string passed to object.__format__

It would be a great convenience if the ndarray grew a __format__() method that understood the tiny mini-language of float formatting, and used the number of digits of precision specified there to make its own call to the standard NumPy vector array formatting. Users could control array appearance on the screen using a Python standard that many programmers already understand.

brandon-rhodes added a commit to skyfielders/python-skyfield that referenced this issue Feb 8, 2015
Without this tweak, the attempt to print was dying with an error,
because a NumPy array does not know what to do with a '.6' format
string:

Traceback (most recent call last):
  File "tmp9.py", line 9, in <module>
      print(mars(tt=2457061.5).position)
        File "/home/brandon/skyfield/skyfield/units.py", line 50, in
  __str__
      return '{0:.6} AU'.format(self.AU)
      TypeError: non-empty format string passed to object.__format__

In response I have opened: numpy/numpy#5543
@njsmith
Copy link
Member
njsmith commented Feb 8, 2015

That would be lovely, yes. Any interest in putting together a patch?
On 8 Feb 2015 09:42, "Brandon Rhodes" notifications@github.com wrote:

In many wonderful cases an ndarray can be used in place of a Python float
and Just Work.

But not in one case:

import numpy as np

n = 1.23
print('{0:.6} AU'.format(n))

n = np.array([1.23, 4.56])
print('{0:.6} AU'.format(n))

The output of the above code, at least under Python 3.4, is:

1.23 AU
Traceback (most recent call last):
File "tmp9.py", line 7, in
print('{0:.6} AU'.format(n))
TypeError: non-empty format string passed to object.format

It would be a great convenience if the ndarray grew a format() method
that understood the tiny mini-language of float formatting, and used the
number of digits of precision specified there to make its own call to the
standard NumPy vector array formatting. Users could control array
appearance on the screen using a Python standard that many programmers
already understand.


Reply to this email directly or view it on GitHub
#5543.

@brandon-rhodes
Copy link
Author

Yes, I will try my hand at it and let you know if it works! Thanks for letting me know that the feature will be welcomed, before I got started.

@jaimefrio
Copy link
Member

What would the expected output be? Numpy seems to be doing what all Python sequences do, and I don't think that breaking that commonality is a good idea:

>>> a = [1.23, 4.56]
>>> aa = np.array(a)

# Python 3.4
>>> print('{0:.6} AU'.format(a))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: non-empty format string passed to object.__format__
>>> print('{0:.6} AU'.format(aa))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: non-empty format string passed to object.__format__

# Python 2.7
>>> print('{0:.6} AU'.format(a))
[1.23, AU
>>> print('{0:.6} AU'.format(aa))
[ 1.23 AU

@njsmith
Copy link
Member
njsmith commented Feb 8, 2015

Ah, I thought we were talking about array scalars (where we obviously
should respect the format string).

It does seem like supporting .format() in a useful way would be an
obviously good thing. If it's not obvious what the most useful way is then
someone should make a proposal on numpy-discussion and go from there.

On Sun, Feb 8, 2015 at 10:31 AM, Jaime notifications@github.com wrote:

What would the expected output be? Numpy seems to be doing what all Python
sequences do, and I don't think that breaking that commonality is a good
idea:

a = [1.23, 4.56]
aa = np.array(a)

Python 3.4

print('{0:.6} AU'.format(a))
Traceback (most recent call last):
File "", line 1, in
TypeError: non-empty format string passed to object.format
print('{0:.6} AU'.format(aa))
Traceback (most recent call last):
File "", line 1, in
TypeError: non-empty format string passed to object.format

Python 2.7

print('{0:.6} AU'.format(a))
[1.23, AU
print('{0:.6} AU'.format(aa))
[ 1.23 AU


Reply to this email directly or view it on GitHub
#5543 (comment).

Nathaniel J. Smith -- http://vorpus.org

@brandon-rhodes
Copy link
Author

@jaimefrio — my understanding is that the entire design of NumPy arrays is precisely to do things that all Python sequences do not. Normal lists cannot accept division; NumPy arrays can. Normal lists cannot be taken to a power; NumPy arrays can. The whole point was to break commonality very nearly everywhere, from what I can see of its design, so that lists would act like numbers.

Accepting a format string is, in my view, symmetrical with division and being taken to a power.

The behavior I am anticipating is roughly that of running str() on a NumPy array, but with the format adjusted as though numpy.set_printoptions had been called to set whatever precision is specified by the format string.

@mhvk
Copy link
Contributor
mhvk commented Feb 9, 2015

I like the suggestion. The array scalar case would certainly just be to have expected behaviour, and it would seem the route of least surprise to apply the format to the individual elements (and otherwise behave as if no format string was given).

@jaimefrio
Copy link
Member

It certainly seems convenient, but there may be a good reason why Python lists, or tuples, do not do the same. The change in behavior from Python 2.x to 3.x probably means that raising an error is a conscious decision. I don't follow any of the Python forums, perhaps someone will know if this has been brought up before somewhere else.

Either way. it is probably a good idea to give this a run through the mailing list, where it will get looked at by more people than here.

@brandon-rhodes
Copy link
Author

Python lists and tuples do not support __format__() because (a) they are non-uniform — they can have any sorts of data inside, so it is not clear whether a format string would be interpreted numerically or for strings or what, and (b) because they really do no formatting of their own: they just wrap their own parens and commas around the repr()'s of whatever strings, ints, floats, and other objects that they happen to contain.

A NumPy array—unless its members are of dtype object_, in which case I entirely agree with you that the array should follow the excellent lead of tuple() and list() by refusing any format string—is in a quite different situation: if it is numeric, then it does not contain smaller objects, and therefore cannot delegate to them with repr(). Instead it takes charge of formatting the floats or ints inside for the screen, making all kinds of format-y decisions like indentation, breaking its output into lines, and even omitting sections of data if there are too may floats.

The change in behavior was simply to avoid disappointing users who provide a format string: if the string is accepted by a type, and raises no error, then people reasonably expect to see some change in format, but were not doing so — under the old Python 2 behavior — and instead were having their format string accepted quietly but then just ignored. Here is the issue in which it was negotiated that an error was better for uses than an ignored formatting string:

http://bugs.python.org/issue7994

I propose that NumPy arrays adhere quite strictly to this standard behavior: in situations where the array itself is formatting its contents, accepting and using the string will allow the user to decide how many decimal places are shown, in the standard way that users already set the precision of data they are printing. In situations where the NumPy array contains other objects, and is doing no numeric formatting of its own, then I see no problem with its ignoring the format string (unless someone wants to make the argument that a NumPy array ought to broadcast it to its members!) and raising the “you gave me a format string I can't use!” exception that is now standard in Python to warn users away from trying to format things in situations where the format will be ignored.

@jaimefrio
Copy link
Member

You have me convinced... Thanks for the detailed explanation!

@brandon-rhodes
Copy link
Author

Thank you for pushing for an improved proposal — the idea that a NumPy array should raise an exception when given a format string if the NumPy array's dtype is object_ had simply not occurred to me, and it's an edge case that users will be very happy that NumPy gets right!

@njsmith (or whomever would like to), feel free to assign this to me so that it shows up on my GitHub to-do list when I sit down this weekend. Thanks for the chance to add this!

@gustavla
Copy link
gustavla commented Feb 2, 2017

I'm really interested in this feature (and willing to contribute) and wanted to check what progress has been made in the last two years. I can't seem to access numpy-discussion right now to see if the discussion has continued there.

@brandon-rhodes
Copy link
Author

Apparently, no one assigned it to me. By that first weekend I had already forgotten about this issue given the press of other responsibilities — so, no progress from me yet.

@shoyer
Copy link
Member
shoyer commented Feb 3, 2017

@gustavla see here for the numpy-discussion mailing list. But I don't think this has been discussed before.

Our rule is that API changes need to reach consensus on the mailing list. This feature feels like a pretty clear win to me (especially with the arrival of f-strings), so I don't anticipate any objections. Still, it would be good to come up with a concrete proposal on how it should work and run that by the mailing list before starting work.

@anntzer
Copy link
Contributor
anntzer commented Feb 15, 2017

@gustavla Tangentially related issue: #6136 (just scratching my own itch...)

eric-wieser added a commit to eric-wieser/numpy that referenced this issue Oct 18, 2017
This fixes numpygh-7978

The behavior for other sized arrays is left unchanged, pending discussion in numpygh-5543
theodoregoetz pushed a commit to theodoregoetz/numpy that referenced this issue Oct 23, 2017
This fixes numpygh-7978

The behavior for other sized arrays is left unchanged, pending discussion in numpygh-5543
@akshaybabloo
Copy link

In many wonderful cases an ndarray can be used in place of a Python float and Just Work.

But not in one case:

import numpy as np

n = 1.23
print('{0:.6} AU'.format(n))

n = np.array([1.23, 4.56])
print('{0:.6} AU'.format(n))

The output of the above code, at least under Python 3.4, is:

1.23 AU
Traceback (most recent call last):
  File "tmp9.py", line 7, in <module>
    print('{0:.6} AU'.format(n))
TypeError: non-empty format string passed to object.__format__

It would be a great convenience if the ndarray grew a __format__() method that understood the tiny mini-language of float formatting, and used the number of digits of precision specified there to make its own call to the standard NumPy vector array formatting. Users could control array appearance on the screen using a Python standard that many programmers already understand.

In Python 3.7, the .format() seems to be working.

@asottile
Copy link
Contributor
asottile commented Dec 8, 2018

throwing another hat into the ring from #12491

import numpy as np
x, y = np.array([-969100.0]), np.array([-4457000.0])

# This works
"%.4g, %.4g" % (x, y)

# This errors
"{:.4g}, {:.4g}".format(x, y)

happening because size-0 / size-1 arrays are treated as their scalar in many places except in __format__

BioGeek added a commit to BioGeek/thinc that referenced this issue Jan 29, 2020
If you run the [00_intro_to_thinc.ipynb](https://github.com/explosion/thinc/blob/master/examples/00_intro_to_thinc.ipynb) notebook on a GPU, you get the following error when you execute the cell where you create an optimizer and do several passes over the data.

```
TypeError                                 Traceback (most recent call last)

<ipython-input-27-c7aae9724b89> in <module>()
     21         total += Yh.shape[0]
     22     score = correct / total
---> 23     print(f" {i} {score:.3f}")

TypeError: unsupported format string passed to cupy.core.core.ndarray.__format__

```
Also see the related open NumPy issue: [ndarray should offer __format__ that can adjust precision #5543 ](numpy/numpy#5543).

This pull request proposes a simple fix/workaround.
honnibal pushed a commit to explosion/thinc that referenced this issue Jan 30, 2020
If you run the [00_intro_to_thinc.ipynb](https://github.com/explosion/thinc/blob/master/examples/00_intro_to_thinc.ipynb) notebook on a GPU, you get the following error when you execute the cell where you create an optimizer and do several passes over the data.

```
TypeError                                 Traceback (most recent call last)

<ipython-input-27-c7aae9724b89> in <module>()
     21         total += Yh.shape[0]
     22     score = correct / total
---> 23     print(f" {i} {score:.3f}")

TypeError: unsupported format string passed to cupy.core.core.ndarray.__format__

```
Also see the related open NumPy issue: [ndarray should offer __format__ that can adjust precision #5543 ](numpy/numpy#5543).

This pull request proposes a simple fix/workaround.
@scratchmex
Copy link
Contributor

Can you guys help me with this? I want to implement this but don't know where to start

@mattip
Copy link
Member
mattip commented Feb 19, 2021

Hi @scratchmex You might want to

  • read the contribution docs
  • play with the HEAD version a bit to understand what already exists and what is missing, especially around scalars, and 0d arrays
  • take a look at the formatting in the arrayprint module, in order to understand how the recursion works you may need to stop in a debugger.
  • try to work out how to extend the scalar.__format__ and array.__format__ to accept additional arguments

@eric-wieser
Copy link
Member
eric-wieser commented Feb 19, 2021

I suspect this issue needs an NEP explaining how formatting will be designed to behave for numpy datatypes, especially for ndarray.

@grisaitis
Copy link

been while since this was last discussed.

what are the steps for adding this? is a NEP required first?

@mattip
Copy link
Member
mattip commented Dec 10, 2022

@grisaitis please see PR #19550

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

0