8000 Vectorized lazy indexing by fujiisoup · Pull Request #1899 · pydata/xarray · GitHub
[go: up one dir, main page]

Skip to content

Vectorized lazy indexing #1899

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 37 commits into from
Mar 6, 2018
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
37 commits
Select commit Hold shift + click to select a range
dceb298
Start working
fujiisoup Feb 9, 2018
c1b4b60
First support of lazy vectorized indexing.
fujiisoup Feb 9, 2018
d989a15
Some optimization.
fujiisoup Feb 9, 2018
218763c
Use unique to decompose vectorized indexing.
fujiisoup Feb 10, 2018
541fca3
Consolidate vectorizedIndexing
fujiisoup Feb 10, 2018
3e05a16
Support vectorized_indexing in h5py
fujiisoup Feb 11, 2018
030a2c4
Refactoring backend array. Added indexing.decompose_indexers. Drop un…
fujiisoup Feb 13, 2018
b9f97b4
typo
fujiisoup Feb 14, 2018
850f29c
bugfix and typo
fujiisoup Feb 14, 2018
943ec78
Fix based on @WeatherGod comments.
fujiis 8000 oup Feb 14, 2018
91aae64
Use enum-like object to indicate indexing-support types.
fujiisoup Feb 15, 2018
991c1da
Update test_decompose_indexers.
fujiisoup Feb 15, 2018
936954a
Bugfix and benchmarks.
fujiisoup Feb 15, 2018
9144965
fix: support outer/basic indexer in LazilyVectorizedIndexedArray
fujiisoup Feb 16, 2018
d1cb976
Merge branch 'master' into vectorized_lazy_indexing
fujiisoup Feb 16, 2018
95e1f1c
More comments.
fujiisoup Feb 16, 2018
180c4f5
Fixing style errors.
stickler-ci Feb 16, 2018
dbbe531
Remove unintended dupicate
fujiisoup Feb 16, 2018
c2e61ad
combine indexers for on-memory np.ndarray.
fujiisoup Feb 16, 2018
b545c3e
fix whats new
fujiisoup Feb 16, 2018
872de73
fix pydap
fujiisoup Feb 16, 2018
bb5d1f6
Update comments.
fujiisoup Feb 16, 2018
cfe29bb
Support VectorizedIndexing for rasterio. Some bugfix.
fujiisoup Feb 17, 2018
17a7dac
Merge branch 'master' into vectorized_lazy_indexing
fujiisoup Feb 18, 2018
2dff278
flake8
fujiisoup Feb 18, 2018
ead6327
More tests
fujiisoup Feb 18, 2018
a90ac05
Use LazilyIndexedArray for scalar array instead of loading.
fujiisoup Feb 18, 2018
73f4958
Support negative step slice in rasterio.
fujiisoup Feb 18, 2018
fd04966
Make slice-step always positive
fujiisoup Feb 18, 2018
b3c3d80
Bugfix in slice-slice
fujiisoup Feb 25, 2018
259f36c
Add pydap support.
fujiisoup Feb 25, 2018
0e7eb2e
Merge branch 'master' into vectorized_lazy_indexing
fujiisoup Feb 26, 2018
0c2e31b
Merge branch 'master' into vectorized_lazy_indexing
fujiisoup Mar 1, 2018
d8421a5
Merge branch 'master' into vectorized_lazy_indexing
fujiisoup Mar 1, 2018
7e0959c
Rename LazilyIndexedArray -> LazilyOuterIndexedArray. Remove duplicat…
fujiisoup Mar 2, 2018
4fccdee
flake8
fujiisoup Mar 2, 2018
8e96710
Added transpose to LazilyOuterIndexedArray
fujiisoup Mar 3, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Rename LazilyIndexedArray -> LazilyOuterIndexedArray. Remove duplicat…
…e in zarr.py
  • Loading branch information
fujiisoup committed Mar 2, 2018
commit 7e0959cf0d8f0c64a39f7a27ec0fe453d282349f
2 changes: 1 addition & 1 deletion xarray/backends/h5netcdf_.py
Original file line number Diff line number Diff line change
Expand Up @@ -90,7 +90,7 @@ def __init__(self, filename, mode='r', format=None, group=None,
def open_store_variable(self, name, var):
with self.ensure_open(autoclose=False):
dimensions = var.dimensions
data = indexing.LazilyIndexedArray(
data = indexing.LazilyOuterIndexedArray(
H5NetCDFArrayWrapper(name, self))
attrs = _read_attributes(var)

Expand Down
2 changes: 1 addition & 1 deletion xarray/backends/netCDF4_.py
Original file line number Diff line number Diff line change
Expand Up @@ -279,7 +279,7 @@ def open(cls, filename, mode='r', format='NETCDF4', group=None,
def open_store_variable(self, name, var):
with self.ensure_open(autoclose=False):
dimensions = var.dimensions
data = indexing.LazilyIndexedArray(NetCDF4ArrayWrapper(name, self))
data = indexing.LazilyOuterIndexedArray(NetCDF4ArrayWrapper(name, self))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

E501 line too long (84 > 79 characters)

attributes = OrderedDict((k, var.getncattr(k))
for k in var.ncattrs())
_ensure_fill_value_valid(data, attributes)
Expand Down
2 changes: 1 addition & 1 deletion xarray/backends/pydap_.py
Original file line number Diff line number Diff line change
Expand Up @@ -78,7 +78,7 @@ def open(cls, url, session=None):
return cls(ds)

def open_store_variable(self, var):
data = indexing.LazilyIndexedArray(PydapArrayWrapper(var))
data = indexing.LazilyOuterIndexedArray(PydapArrayWrapper(var))
return Variable(var.dimensions, data,
_fix_attributes(var.attributes))

Expand Down
2 changes: 1 addition & 1 deletion xarray/backends/pynio_.py
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@ def __init__(self, filename, mode='r', autoclose=False):
self._mode = mode

def open_store_variable(self, name, var):
data = indexing.LazilyIndexedArray(NioArrayWrapper(name, self))
data = indexing.LazilyOuterIndexedArray(NioArrayWrapper(name, self))
return Variable(var.dimensions, data, var.attributes)

def get_variables(self):
Expand Down
2 changes: 1 addition & 1 deletion xarray/backends/rasterio_.py
Original file line number Diff line number Diff line change
Expand Up @@ -274,7 +274,7 @@ def open_rasterio(filename, parse_coordinates=None, chunks=None, cache=None,
else:
attrs[k] = v

data = indexing.LazilyIndexedArray(RasterioArrayWrapper(riods))
data = indexing.LazilyOuterIndexedArray(RasterioArrayWrapper(riods))

# this lets you write arrays loaded with rasterio
data = indexing.CopyOnWriteArray(data)
Expand Down
30 changes: 3 additions & 27 deletions xarray/backends/zarr.py
Original file line number Diff line number Diff line change
Expand Up @@ -41,30 +41,6 @@ def _ensure_valid_fill_value(value, dtype):
return _encode_zarr_attr_value(valid)


def _replace_slices_with_arrays(key, shape):
"""Replace slice objects in vindex with equivalent ndarray objects."""
num_slices = sum(1 for k in key if isinstance(k, slice))
ndims = [k.ndim for k in key if isinstance(k, np.ndarray)]
array_subspace_size = max(ndims) if ndims else 0
assert len(key) == len(shape)
new_key = []
slice_count = 0
for k, size in zip(key, shape):
if isinstance(k, slice):
# the slice subspace always appears after the ndarray subspace
array = np.arange(*k.indices(size))
sl = [np.newaxis] * len(shape)
sl[array_subspace_size + slice_count] = slice(None)
k = array[tuple(sl)]
slice_count += 1
else:
assert isinstance(k, np.ndarray)
k = k[(slice(None),) * array_subspace_size +
(np.newaxis,) * num_slices]
new_key.append(k)
return tuple(new_key)


class ZarrArrayWrapper(BackendArray):
def __init__(self, variable_name, datastore):
self.datastore = datastore
Expand All @@ -84,8 +60,8 @@ def __getitem__(self, key):
if isinstance(key, indexing.BasicIndexer):
return array[key.tuple]
elif isinstance(key, indexing.VectorizedIndexer):
return array.vindex[_replace_slices_with_arrays(key.tuple,
self.shape)]
return array.vindex[indexing._arrayize_vectorized_indexer(
key.tuple, self.shape).tuple]
else:
assert isinstance(key, indexing.OuterIndexer)
return array.oindex[key.tuple]
Expand Down Expand Up @@ -292,7 +268,7 @@ def __init__(self, zarr_group, writer=None):
super(ZarrStore, self).__init__(zarr_writer)

def open_store_variable(self, name, zarr_array):
data = indexing.LazilyIndexedArray(ZarrArrayWrapper(name, self))
data = indexing.LazilyOuterIndexedArray(ZarrArrayWrapper(name, self))
dimensions, attributes = _get_zarr_dims_and_attrs(zarr_array,
_DIMENSION_KEY)
attributes = OrderedDict(attributes)
Expand Down
2 changes: 1 addition & 1 deletion xarray/conventions.py
Original file line number Diff line number Diff line change
Expand Up @@ -490,7 +490,7 @@ def decode_cf_variable(name, var, concat_characters=True, mask_and_scale=True,
del attributes['dtype']
data = BoolTypeArray(data)

return Variable(dimensions, indexing.LazilyIndexedArray(data),
return Variable(dimensions, indexing.LazilyOuterIndexedArray(data),
attributes, encoding=encoding)


Expand Down
12 changes: 7 additions & 5 deletions xarray/core/indexing.py
Original file line number Diff line number Diff line change
Expand Up @@ -440,7 +440,7 @@ def __getitem__(self, key):
return self.array[self.indexer_cls(key)]


class LazilyIndexedArray(ExplicitlyIndexedNDArrayMixin):
class LazilyOuterIndexedArray(ExplicitlyIndexedNDArrayMixin):
"""Wrap an array to make basic and orthogonal indexing lazy.
"""

Expand Down Expand Up @@ -541,10 +541,10 @@ def _updated_key(self, new_key):
return _combine_indexers(self.key, self.shape, new_key)

def __getitem__(self, indexer):
# If the indexed array becomes a scalar, return LazilyIndexedArray.
# If the indexed array becomes a scalar, return LazilyOuterIndexedArray.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

E501 line too long (80 > 79 characters)

if all(isinstance(ind, integer_types) for ind in indexer.tuple):
key = BasicIndexer(tuple(k[indexer.tuple] for k in self.key.tuple))
return LazilyIndexedArray(self.array, key)
return LazilyOuterIndexedArray(self.array, key)
return type(self)(self.array, self._updated_key(indexer))

def transpose(self, order):
Expand Down Expand Up @@ -645,7 +645,7 @@ def _outer_to_vectorized_indexer(key, shape):

Returns
-------
VectorizedInexer
VectorizedIndexer
Tuple suitable for use to index a NumPy array with vectorized indexing.
Each element is an array: broadcasting them together gives the shape
of the result.
Expand Down Expand Up @@ -923,7 +923,9 @@ def _decompose_outer_indexer(indexer, shape, indexing_support):
return (OuterIndexer(tuple(backend_indexer)),
OuterIndexer(tuple(np_indexer)))

# basic
# basic indexer
assert indexing_support == IndexingSupport.BASIC

for k, s in zip(indexer, shape):
if isinstance(k, np.ndarray):
# np.ndarray key is converted to slice that covers the entire
Expand Down
2 changes: 1 addition & 1 deletion xarray/core/variable.py
Original file line number Diff line number Diff line change
Expand Up @@ -121,7 +121,7 @@ def _maybe_wrap_data(data):
Put pandas.Index and numpy.ndarray arguments in adapter objects to ensure
they can be indexed properly.

NumpyArrayAdapter, PandasIndexAdapter and LazilyIndexedArray should
NumpyArrayAdapter, PandasIndexAdapter and LazilyOuterIndexedArray should
all pass through unmodified.
"""
if isinstance(data, pd.Index):
Expand Down
21 changes: 0 additions & 21 deletions xarray/tests/test_backends.py
Original file line number Diff line number Diff line change
Expand Up @@ -1402,27 +1402,6 @@ def create_zarr_target(self):
yield tmp


def test_replace_slices_with_arrays():
(actual,) = xr.backends.zarr._replace_slices_with_arrays(
key=(slice(None),), shape=(5,))
np.testing.assert_array_equal(actual, np.arange(5))

actual = xr.backends.zarr._replace_slices_with_arrays(
key=(np.arange(5),) * 3, shape=(8, 10, 12))
expected = np.stack([np.arange(5)] * 3)
np.testing.assert_array_equal(np.stack(actual), expected)

a, b = xr.backends.zarr._replace_slices_with_arrays(
key=(np.arange(5), slice(None)), shape=(8, 10))
np.testing.assert_array_equal(a, np.arange(5)[:, np.newaxis])
np.testing.assert_array_equal(b, np.arange(10)[np.newaxis, :])

a, b = xr.backends.zarr._replace_slices_with_arrays(
key=(slice(None), np.arange(5)), shape=(8, 10))
np.testing.assert_array_equal(a, np.arange(8)[np.newaxis, :])
np.testing.assert_array_equal(b, np.arange(5)[:, np.newaxis])


@requires_scipy
class ScipyInMemoryDataTest(CFEncodedDataTest, NetCDF3Only, TestCase):
engine = 'scipy'
Expand Down
2 changes: 1 addition & 1 deletion xarray/tests/test_dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -77,7 +77,7 @@ def get_variables(self):
def lazy_inaccessible(k, v):
if k in self._indexvars:
return v
data = indexing.LazilyIndexedArray(InaccessibleArray(v.values))
data = indexing.LazilyOuterIndexedArray(InaccessibleArray(v.values))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

E501 line too long (80 > 79 characters)

return Variable(v.dims, data, v.attrs)
return dict((k, lazy_inaccessible(k, v)) for
k, v in iteritems(self._variables))
Expand Down
39 changes: 31 additions & 8 deletions xarray/tests/test_indexing.py
Original file line number Diff line number Diff line change
Expand Up @@ -153,7 +153,7 @@ def test_lazily_indexed_array(self):
original = np.random.rand(10, 20, 30)
x = indexing.NumpyIndexingAdapter(original)
v = Variable(['i', 'j', 'k'], original)
lazy = indexing.LazilyIndexedArray(x)
lazy = indexing.LazilyOuterIndexedArray(x)
v_lazy = Variable(['i', 'j', 'k'], lazy)
I = ReturnItem() # noqa: E741 # allow ambiguous name
# test orthogonally applied indexers
Expand All @@ -172,7 +172,7 @@ def test_lazily_indexed_array(self):
assert expected.shape == actual.shape
assert_array_equal(expected, actual)
assert isinstance(actual._data,
indexing.LazilyIndexedArray)
indexing.LazilyOuterIndexedArray)

# make sure actual.key is appropriate type
if all(isinstance(k, native_int_types + (slice, ))
Expand All @@ -191,15 +191,15 @@ def test_lazily_indexed_array(self):
actual = v_lazy[i][j]
assert expected.shape == actual.shape
assert_array_equal(expected, actual)
assert isinstance(actual._data, indexing.LazilyIndexedArray)
assert isinstance(actual._data, indexing.LazilyOuterIndexedArray)
assert isinstance(actual._data.array,
indexing.NumpyIndexingAdapter)

def test_vectorized_lazily_indexed_array(self):
original = np.random.rand(10, 20, 30)
x = indexing.NumpyIndexingAdapter(original)
v_eager = Variable(['i', 'j', 'k'], x)
lazy = indexing.LazilyIndexedArray(x)
lazy = indexing.LazilyOuterIndexedArray(x)
v_lazy = Variable(['i', 'j', 'k'], lazy)
I = ReturnItem() # noqa: E741 # allow ambiguous name

Expand All @@ -210,7 +210,7 @@ def check_indexing(v_eager, v_lazy, indexers):
assert expected.shape == actual.shape
assert isinstance(actual._data,
(indexing.LazilyVectorizedIndexedArray,
indexing.LazilyIndexedArray))
indexing.LazilyOuterIndexedArray))
assert_array_equal(expected, actual)
v_eager = expected
v_lazy = actual
Expand Down Expand Up @@ -263,19 +263,19 @@ def test_index_scalar(self):

class TestMemoryCachedArray(TestCase):
def test_wrapper(self):
original = indexing.LazilyIndexedArray(np.arange(10))
original = indexing.LazilyOuterIndexedArray(np.arange(10))
wrapped = indexing.MemoryCachedArray(original)
assert_array_equal(wrapped, np.arange(10))
assert isinstance(wrapped.array, indexing.NumpyIndexingAdapter)

def test_sub_array(self):
original = indexing.LazilyIndexedArray(np.arange(10))
original = indexing.LazilyOuterIndexedArray(np.arange(10))
wrapped = indexing.MemoryCachedArray(original)
child = wrapped[B[:5]]
assert isinstance(child, indexing.MemoryCachedArray)
assert_array_equal(child, np.arange(5))
assert isinstance(child.array, indexing.NumpyIndexingAdapter)
assert isinstance(wrapped.array, indexing.LazilyIndexedArray)
assert isinstance(wrapped.array, indexing.LazilyOuterIndexedArray)

def test_setitem(self):
original = np.arange(10)
Expand Down Expand Up @@ -389,6 +389,29 @@ def test_arrayize_vectorized_indexer(self):
np.testing.assert_array_equal(
self.data[vindex], self.data[vindex_array],)

actual = indexing._arrayize_vectorized_indexer(
indexing.VectorizedIndexer((slice(None),)), shape=(5,))
np.testing.assert_array_equal(actual.tuple, [np.arange(5)])

actual = indexing._arrayize_vectorized_indexer(
indexing.VectorizedIndexer((np.arange(5),) * 3), shape=(8, 10, 12))
expected = np.stack([np.arange(5)] * 3)
np.testing.assert_array_equal(np.stack(actual.tuple), expected)

actual = indexing._arrayize_vectorized_indexer(
indexing.VectorizedIndexer((np.arange(5), slice(None))),
shape=(8, 10))
a, b = actual.tuple
np.testing.assert_array_equal(a, np.arange(5)[:, np.newaxis])
np.testing.assert_array_equal(b, np.arange(10)[np.newaxis, :])

actual = indexing._arrayize_vectorized_indexer(
indexing.VectorizedIndexer((slice(None), np.arange(5))),
shape=(8, 10))
a, b = actual.tuple
np.testing.assert_array_equal(a, np.arange(8)[np.newaxis, :])
np.testing.assert_array_equal(b, np.arange(5)[:, np.newaxis])


def get_indexers(shape, mode):
if mode == 'vectorized':
Expand Down
18 changes: 9 additions & 9 deletions xarray/tests/test_variable.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@
from xarray.core import indexing
from xarray.core.common import full_like, ones_like, zeros_like
from xarray.core.indexing import (
BasicIndexer, CopyOnWriteArray, DaskIndexingAdapter, LazilyIndexedArray,
BasicIndexer, CopyOnWriteArray, DaskIndexingAdapter, LazilyOuterIndexedArray,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

E501 line too long (81 > 79 characters)

MemoryCachedArray, NumpyIndexingAdapter, OuterIndexer, PandasIndexAdapter,
VectorizedIndexer)
from xarray.core.pycompat import PY3, OrderedDict
Expand Down Expand Up @@ -988,9 +988,9 @@ def test_repr(self):
assert expected == repr(v)

def test_repr_lazy_data(self):
v = Variable('x', LazilyIndexedArray(np.arange(2e5)))
v = Variable('x', LazilyOuterIndexedArray(np.arange(2e5)))
assert '200000 values with dtype' in repr(v)
assert isinstance(v._data, LazilyIndexedArray)
assert isinstance(v._data, LazilyOuterIndexedArray)

def test_detect_indexer_type(self):
""" Tests indexer type was correctly detected. """
Expand Down Expand Up @@ -1798,7 +1798,7 @@ def test_rolling_window(self):

class TestAsCompatibleData(TestCase):
def test_unchanged_types(self):
types = (np.asarray, PandasIndexAdapter, indexing.LazilyIndexedArray)
types = (np.asarray, PandasIndexAdapter, indexing.LazilyOuterIndexedArray)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

E501 line too long (82 > 79 characters)

for t in types:
for data in [np.arange(3),
pd.date_range('2000-01-01', periods=3),
Expand Down Expand Up @@ -1961,17 +1961,17 @@ def test_NumpyIndexingAdapter(self):
v = Variable(dims=('x', 'y'), data=NumpyIndexingAdapter(
NumpyIndexingAdapter(self.d)))

def test_LazilyIndexedArray(self):
v = Variable(dims=('x', 'y'), data=LazilyIndexedArray(self.d))
def test_LazilyOuterIndexedArray(self):
v = Variable(dims=('x', 'y'), data=LazilyOuterIndexedArray(self.d))
self.check_orthogonal_indexing(v)
self.check_vectorized_indexing(v)
# doubly wrapping
v = Variable(dims=('x', 'y'),
data=LazilyIndexedArray(LazilyIndexedArray(self.d)))
data=LazilyOuterIndexedArray(LazilyOuterIndexedArray(self.d)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

E501 line too long (83 > 79 characters)

self.check_orthogonal_indexing(v)
# hierarchical wrapping
v = Variable(dims=('x', 'y'),
data=LazilyIndexedArray(NumpyIndexingAdapter(self.d)))
data=LazilyOuterIndexedArray(NumpyIndexingAdapter(self.d)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

E501 line too long (80 > 79 characters)

self.check_orthogonal_indexing(v)

def test_CopyOnWriteArray(self):
Expand All @@ -1980,7 +1980,7 @@ def test_CopyOnWriteArray(self):
self.check_vectorized_indexing(v)
# doubly wrapping
v = Variable(dims=('x', 'y'),
data=CopyOnWriteArray(LazilyIndexedArray(self.d)))
data=CopyOnWriteArray(LazilyOuterIndexedArray(self.d)))
self.check_orthogonal_indexing(v)
self.check_vectorized_indexing(v)

Expand Down
0