8000 formatting of singleton DataArrays · Issue #2791 · pydata/xarray · GitHub
[go: up one dir, main page]

Skip to content

formatting of singleton DataArrays #2791

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
yohai opened this issue Feb 27, 2019 · 8 comments
Closed

formatting of singleton DataArrays #2791

yohai opened this issue Feb 27, 2019 · 8 comments

Comments

@yohai
Copy link
Contributor
yohai commented Feb 27, 2019

Code Sample, a copy-pastable example if possible

a=xr.DataArray(1.234)
'{:1.3f}'.format(a) # throws error
'{:1.3f}'.format(a.values) # behaves nicely

Problem description

I think it would be useful for repr(a) to return repr(a.values) when a.ndim==0 and perhaps also when a.size==1. For example, this would make such code work:

a=xr.DataArray(range(4))
' '.join('{:d}'.format(v) for v in a)

while currently one has to write

' '.join('{:d}'.format(v.values) for v in a)

I tried to think whether this will break something else but I couldn't think of anything.

@shoyer
Copy link
Member
shoyer commented Mar 4, 2019

Yes, I think this would be a nice addition. This would entail implementing a __format__ method on xarray.DataArray:
https://docs.python.org/3/reference/datamodel.html#object.__format__

@fujiisoup
Copy link
Member

I agree that it is a bit annoying that 1d DataArray prints much information especially we want to embed the value into a string.
However, I'm a bit worried whether it would be surprising if an object that looks a native scalar is actually an xr.DataArray of one element, especially when working in an interactive environment.

@yohai
Copy link
Contributor Author
yohai commented Mar 4, 2019

On the one hand I agree, but note that the same behavior works for numpy arrays

import numpy as np
a=np.array([1,2,3,4])
' '.join('{:d}'.format(v) for v in a)
# prints '1 2 3 4'

@shoyer
Copy link
Member
shoyer commented Mar 4, 2019

Here's a related NumPy issue: numpy/numpy#5543

I guess there are two possible behaviors for '{:d}'.format(x) where x is a DataArray object:

  • coerce scalar arrays to native Python numbers and format it like a float
  • vectorize format() over each element of the array (the proposal in the linked numpy issue)

These behaviors would definitely conflict for scalar objects -- in the second case, we would still want to include some indication that it's an xarray.DataArray. NumPy doesn't have a conflict because indexing an array results in a NumPy scalars, which prints like Python builtin scalars.

@fujiisoup
Copy link
Member

@yohai , sorry, I misunderstood __format__ and __repr__.
I like shoyer's

vectorize format() over each element of the array (the proposal in the linked numpy issue)

as I feel it more consistent with the existing xarray __repr__.

I sometimes want a 0d-dataarray to behave as a native scalar.
format is one of a typical case, but there are several other cases, e.g., np.ones(xr.DataArray([0])[0]).
Therefore, I always needs to be carefule whether the scalar is xarray object or not.

I am a bit worrying if printing 0d-dataarray as a scalar would confuse me as it is a scalar not a 0d-array.

@yohai
Copy link
Contributor Author
yohai commented Mar 8, 2019

I tend towards the former, to coerce singleton arrays to behave as scalars of their dytpe. I think it makes more sense in terms of use cases (at least everything that I needed). I don't mind implementing it if there is agreement as to which of the two to do.

These behaviors would definitely conflict for scalar objects -- in the second case, we would still want to include some indication that it's an xarray.DataArray. NumPy doesn't have a conflict because indexing an array results in a NumPy scalars, which prints like Python builtin scalars.

@shoyer I don't see why would that be the case. If I format something as '{:04d} {:3.5e} {:2.3E}'.format(dataarray) or whatnot, I would expect that the average user would expect to get '0043 4.35000e+02 2.450E+02' in return, without any indication that these are data arrays.

@yohai
Copy link
Contributor Author
yohai commented Mar 9, 2019

To make things concrete, the solution that I have in mind is as simple as adding this function to DataArray:

 def __format__(self, format_spec):
        return self.values.__format__(format_spec)

Here's one use case I have encountered:

ds=xr.Dataset({'A':(['x','y','z'], np.random.rand(40,40,3)),
               'B':(['z'], np.random.randn(3))},
              coords={'z':[31,42,45]})
fg=ds.A.plot(col='z')
for ax, d in zip(fg.axes.flat, fg.name_dicts.flat):
    t=ax.get_title()
    ax.set_title('{} and B(z)={:1.2}'.format(t, ds.sel(**d).B))

Screen Shot 2019-03-08 at 21 11 38

This way, if you want to vectorize a __format__ on an array can you not simply do

ar = xr.DataArray([39, 103, id(xr)])
print('{:3.3f} {:3.3e} {:x}'.format(*ar))
#prints `39.000 1.030e+02 10e5bb548`

@max-sixty
Copy link
Collaborator

This seems to work now, closing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants
0