8000 Do not convert subclasses of `ndarray` unless required by burnpanck · Pull Request #1118 · pydata/xarray · GitHub
[go: up one dir, main page]

Skip to content

Do not convert subclasses of ndarray unless required #1118

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 8 commits into from
Closed
Next Next commit
use asanyarray instead of asarray in data accessor of Variable.
  • Loading branch information
burnpanck committed Nov 14, 2016
commit 40df7b0b73d97c053b4e8c7194b9bf2419a66f08
2 changes: 1 addition & 1 deletion xarray/core/variable.py
Original file line number Diff line number Diff line change
Expand Up @@ -185,7 +185,7 @@ def _as_array_or_item(data):

TODO: remove this (replace with np.asarray) once these issues are fixed
"""
data = np.asarray(data)
data = np.asanyarray(data)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method should probably be renamed, e.g., _as_any_array_or_item, just to make it super clear what it's doing (also update the TODO in the docstring)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok

if data.ndim == 0:
if data.dtype.kind == 'M':
data = np.datetime64(data, 'ns')
Expand Down
55 changes: 55 additions & 0 deletions xarray/test/test_quantities.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
import numpy as np
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a docstring explaining that this module uses quantities as a concrete example, but really it's just checking for subclass compatibility.

Also, I might call it something more genericl ike test_numpy_subclasses.py.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I was rather thinking of this test-file as specialised towards physical units handling with quantities (i.e. as a regression-test database geared towards #525). I'd rather make a separate test-case for general subclasses, maybe with a dummy subclass.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, test_units_subclass.py also seems reasonable.

import pandas as pd

from xarray import (align, broadcast, Dataset, DataArray, Variable)

from xarray.test import (TestCase, unittest)

try:
import quantities as pq
has_quantities = True
except ImportError:
has_quantities = False

def requires_quantities(test):
return test if has_quantities else unittest.skip('requires dask')(test)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to update the message here.


@requires_quantities
class TestWithQuantities(TestCase):
def setUp(self):
self.x = np.arange(10) * pq.A
self.y = np.arange(20)
self.xp = np.arange(10) * pq.J
self.v = np.arange(10*20).reshape(10,20) * pq.V
self.da = DataArray(self.v,dims=['x','y'],coords=dict(x=self.x,y=self.y,xp=(['x'],self.xp)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please run a PEP8 check on this code -- we try to stick to the 80 character line width and use spaces after commas.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.


def assertEqualWUnits(self,a,b):
self.assertIsNotNone(getattr(a, 'units', None))
self.assertIsNotNone(getattr(b, 'units', None))
self.assertEqual(a.units,b.units)
np.testing.assert_allclose(a.magnitude,b.magnitude)

def test_units_in_data_and_coords(self):
da = self.da
self.assertEqualWUnits(da.xp.data,self.xp)
self.assertEqualWUnits(da.data,self.v)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we adjust things so that DataArray.values still always returns a base numpy.ndarray, but DataArray.data can return a subclass? That would be consistent with our current behavior for handling duck-type arrays with dask.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine, but I am not so happy about this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you elaborate? :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The question is, what should be the exact guarantee that .values wants to give above .data. For me, units are an inherent part of the data (see my comment in #525) My view here is that the behaviour of an instance of Quantity is indeed very close to that of the bare ndarray, and thus shouldn't be treated any different. The problem here is of course that this will not necessarily be the case for a general subclass. So, the question is, where to draw the line between .values and .data. Should it be bare ndarray vs. everything else or just ndarray subclass vs. things that behave like arrays but aren't. The best answer will depend both on what is out there in terms of subclasses and objects that just look like arrays, as well as what the code using xarray expects to be able to do with what it gets from .value. I don't want to judge that, because I think I don't know enough about the eco-system xarray lives in. (E.g. dask: Is a dask array an ndarray subclass or not?). In any case, if you think .values should be bare ndarray only, then I'll respect that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dask arrays are not ndarray subclasses, so agree, it's not entirely the same.

Either way, some users will need to wrap their calls to .values or .data with np.asanyarray() (to ensure the result is an ndarray subclass and not possibly a dask array) or np.asarray() (to ensure the result is a base ndarray). Actually, it's probably worth checking if np.asanyarray(data_array) does the right thing in your patch -- possibly DataArray.__array__ needs to be updated to call np.asanyarray.

Just speaking personally -- and I'm happy for other to chime in with their opinions -- I find a shortcut for np.asarray more essential than np.asanyarray. There are lots of libraries not written to handle anything other than base numpy array objects, and being able to convert into something that is guaranteed to work is valuable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so if you say .values is really just meant to be a shortcut over asarray or asanyarray, then I agree the former is more important.


def test_arithmetics(self):
da = self.da
f = DataArray(np.arange(10*20).reshape(10,20)*pq.A,dims=['x','y'],coords=dict(x=self.x,y=self.y))
self.assertEqualWUnits((da*f).data, da.data*f.data)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about arithmetic that requires broadcasting? e.g., between arrays with different dimensions and/or different dimension order?

That will require ensuring that ops.transpose and ops.broadcast_to work on subclasses (e.g., xarray.Variable.transpose and xarray.Variable.expand_dims). I think the former should work already because of NumPy, but broadcast_to will probably require adding subok=True.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll have a go.


def test_unit_checking(self):
da = self.da
f = DataArray(np.arange(10*20).reshape(10,20)*pq.A,dims=['x','y'],coords=dict(x=self.x,y=self.y))
with self.assertRaisesRegex(ValueError,'Unable to convert between units'):
da + f

@unittest.expectedFailure
def test_units_in_indexes(self):
"""
Indexes are borrowed from Pandas, and Pandas does not support units.
Therefore, we currently don't intend to support units on idexes either.
"""
da = self.da
self.assertEqualWUnits(da.x.data,self.x)
0