Description
The gist of the issue is that when trying to store an array-like object to a Zarr Group, it looks for the chunks
attribute to advise on how to chunk. When it finds chunks
on Dask Arrays, the chunk sizes are not global and uniform necessarily, but specific sizes are given for each chunk, which may not be uniform. Zarr understandably stumbles over this as the format is not what it expects.
Even if Zarr could handle the Dask chunking format somehow, there is a question of what to do with non-uniform chunk sizes. There are two main options to consider: support non-uniform chunking in Zarr ( https://github.com/zarr-developers/zarr/issues/245 ) and/or rechunk Dask Arrays to be uniform ( dask/dask#3302 ). So some things to think about on both fronts. This should help provide both of those issues more context.
cc @mrocklin
Minimal, reproducible code sample, a copy-pastable example if possible
In [1]: import zarr
In [2]: import dask.array as da
In [3]: z = zarr.open_group("test.zarr")
In [4]: a = da.random.random((100, 110), chunks=10)
In [5]: z["a"] = a
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-5-142459babd67> in <module>()
----> 1 z["a"] = a
/zopt/conda2/envs/test/lib/python3.6/site-packages/zarr/hierarchy.py in __setitem__(self, item, value)
335
336 def __setitem__(self, item, value):
--> 337 self.array(item, value, overwrite=True)
338
339 def __delitem__(self, item):
/zopt/conda2/envs/test/lib/python3.6/site-packages/zarr/hierarchy.py in array(self, name, data, **kwargs)
908 """Create an array. Keyword arguments as per
909 :func:`zarr.creation.array`."""
--> 910 return self._write_op(self._array_nosync, name, data, **kwargs)
911
912 def _array_nosync(self, name, data, **kwargs):
/zopt/conda2/envs/test/lib/python3.6/site-packages/zarr/hierarchy.py in _write_op(self, f, *args, **kwargs)
628
629 with lock:
--> 630 return f(*args, **kwargs)
631
632 def create_group(self, name, overwrite=False):
/zopt/conda2/envs/test/lib/python3.6/site-packages/zarr/hierarchy.py in _array_nosync(self, name, data, **kwargs)
915 kwargs.setdefault('cache_attrs', self.attrs.cache)
916 return array(data, store=self._store, path=path, chunk_store=self._chunk_store,
--> 917 **kwargs)
918
919 def empty_like(self, name, data, **kwargs):
/zopt/conda2/envs/test/lib/python3.6/site-packages/zarr/creation.py in array(data, **kwargs)
336
337 # instantiate array
--> 338 z = create(**kwargs)
339
340 # fill with data
/zopt/conda2/envs/test/lib/python3.6/site-packages/zarr/creation.py in create(shape, chunks, dtype, compressor, fill_value, order, store, synchronizer, overwrite, path, chunk_store, filters, cache_metadata, cache_attrs, read_only, object_codec, **kwargs)
117 init_array(store, shape=shape, chunks=chunks, dtype=dtype, compressor=compressor,
118 fill_value=fill_value, order=order, overwrite=overwrite, path=path,
--> 119 chunk_store=chunk_store, filters=filters, object_codec=object_codec)
120
121 # instantiate array
/zopt/conda2/envs/test/lib/python3.6/site-packages/zarr/storage.py in init_array(store, shape, chunks, dtype, compressor, fill_value, order, overwrite, path, chunk_store, filters, object_codec)
311 order=order, overwrite=overwrite, path=path,
312 chunk_store=chunk_store, filters=filters,
--> 313 object_codec=object_codec)
314
315
/zopt/conda2/envs/test/lib/python3.6/site-packages/zarr/storage.py in _init_array_metadata(store, shape, chunks, dtype, compressor, fill_value, order, overwrite, path, chunk_store, filters, object_codec)
332 shape = normalize_shape(shape)
333 dtype, object_codec = normalize_dtype(dtype, object_codec)
--> 334 chunks = normalize_chunks(chunks, shape, dtype.itemsize)
335 order = normalize_order(order)
336 fill_value = normalize_fill_value(fill_value, dtype)
/zopt/conda2/envs/test/lib/python3.6/site-packages/zarr/util.py in normalize_chunks(chunks, shape, typesize)
122 # handle None in chunks
123 chunks = tuple(s if c is None else int(c)
--> 124 for s, c in zip(shape, chunks))
125
126 return chunks
/zopt/conda2/envs/test/lib/python3.6/site-packages/zarr/util.py in <genexpr>(.0)
122 # handle None in chunks
123 chunks = tuple(s if c is None else int(c)
--> 124 for s, c in zip(shape, chunks))
125
126 return chunks
TypeError: int() argument must be a string, a bytes-like object or a number, not 'tuple'
Problem description
Ideally would like to be able to store Dask Arrays to Zarr with little more than __setitem__
. In practice this doesn't work. That said, this borders more on a feature request than a bug report. Given that Dask Arrays are really not NumPy arrays, we may need a from_dask_array
method
Version and installation information
Please provide the following:
- Value of
zarr.__version__
: 2.2.0 - Value of
numcodecs.__version__
: 0.5.4 - Version of Python interpreter: 3.6.4
- Operating system (Linux/Windows/Mac): Mac
- How Zarr was installed (e.g., "using pip into virtual environment", or "using conda"): conda
Also, if you think it might be relevant, please provide the output from pip freeze
or
conda env export
depending on which was used to install Zarr.
name: test
channels:
- conda-forge
- defaults
dependencies:
- appnope=0.1.0=py36_0
- asciitree=0.3.3=py36_1
- blas=1.1=openblas
- bokeh=0.12.14=py36_1
- ca-certificates=2018.1.18=0
- certifi=2018.1.18=py36_0
- click=6.7=py_1
- cloudpickle=0.5.2=py_0
- cytoolz=0.9.0.1=py36_0
- dask=0.17.2=py_0
- dask-core=0.17.2=py_0
- decorator=4.2.1=py36_0
- distributed=1.21.4=py36_0
- fasteners=0.14.1=py36_2
- heapdict=1.0.0=py36_0
- ipython=6.2.1=py36_1
- ipython_genutils=0.2.0=py36_0
- jedi=0.11.1=py36_0
- jinja2=2.10=py36_0
- libgfortran=3.0.0=0
- locket=0.2.0=py36_1
- markupsafe=1.0=py36_0
- monotonic=1.4=py36_0
- msgpack-python=0.5.5=py36_0
- ncurses=5.9=10
- numcodecs=0.5.4=py36_0
- numpy=1.14.2=py36_blas_openblas_200
- openblas=0.2.20=7
- openssl=1.0.2n=0
- packaging=17.1=py_0
- pandas=0.22.0=py36_0
- parso=0.1.1=py_0
- partd=0.3.8=py36_0
- pexpect=4.4.0=py36_0
- pickleshare=0.7.4=py36_0
- prompt_toolkit=1.0.15=py36_0
- psutil=5.4.3=py36_0
- ptyprocess=0.5.2=py36_0
- pygments=2.2.0=py36_0
- pyparsing=2.2.0=py36_0
- python=3.6.4=0
- python-dateutil=2.7.1=py_0
- pytz=2018.3=py_0
- pyyaml=3.12=py36_1
- readline=7.0=0
- setuptools=39.0.1=py36_0
- simplegeneric=0.8.1=py36_0
- six=1.11.0=py36_1
- sortedcontainers=1.5.9=py36_0
- sqlite=3.20.1=2
- tblib=1.3.2=py36_0
- tk=8.6.7=0
- toolz=0.9.0=py_0
- tornado=5.0.1=py36_1
- traitlets=4.3.2=py36_0
- wcwidth=0.1.7=py36_0
- xz=5.2.3=0
- yaml=0.1.7=0
- zarr=2.2.0=py_1
- zict=0.1.3=py_0
- zlib=1.2.11=0