8000 Bump Numcodecs requirement to 0.6.2 (#352) · zarr-developers/zarr-python@c4427a4 · GitHub
[go: up one dir, main page]

Skip to content

Commit c4427a4

Browse files
authored
Bump Numcodecs requirement to 0.6.2 (#352)
* Bump Numcodecs requirement to 0.6.1 * Assert MsgPack round-trips bytes objects correctly Previously MsgPack was turning bytes objects to unicode objects when round-tripping them. However this has been fixed in the latest version of Numcodecs. So correct this test now that MsgPack is working correctly. * properly guard against removal of object codec * Ensure `chunk` in `_decode_chunk` is an `ndarray` * Reshape `chunk` ourselves since it is an `ndarray` As we already ensured the `chunk` is an `ndarray` viewing the original data, there is no need for us to do that here as well. Plus the checks performed by `ensure_contiguous_ndarray` are not needed for our use case here. Particularly as we have already handled the unusual type cases above. We also don't need to constrain the buffer size. As such the only thing we really need is to flatten the array and make it contiguous, which is what we handle here directly. * Refactor `reshape` from `_decode_chunk` As both the expected `object` case and the non-`object` case perform a `reshape` to flatten the data, go ahead and refactor that out of both cases and handle it generally. Simplifies the code a bit. * Consolidate type checks in `_decode_chunk` As refactoring of the `reshape` step has effectively dropped the expected `object` type case, the checks for different types is a little more complicated than needed. To fix this, basically invert and swap the case ordering. This way we can handle all generally expected types first and simply cast them. Then we can raise if an `object` type shows up and is unexpected. * Drop `ensure_bytes` definition from `zarr.storage` As Numcodecs now includes a very versatile and effective `ensure_bytes` function, there is no need to define our own in `zarr.storage` as well. So go ahead and drop it. * Take flattened array views to avoid some copies Make use of Numcodecs' `ensure_contiguous_ndarray` to take `ndarray` views onto buffers to be stored in a few cases so as to reshape them and avoid a copy (thanks to the buffer protocol). This ensures that datetime/timedeltas are handled by default. Also catches things like object arrays. Finally this handles flattening the array if needed. All-in-all this gets as close to a `bytes` object as possible while not copying and doing its best to preserve type information while constructing something that fits the buffer protocol. * Simplify `buffer_size` by using `ensure_ndarray` Rewrite `buffer_size` to just use Numcodecs' `ensure_ndarray` to get an `ndarray` that views the data. Once the `ndarray` is gotten, all that is needed is to access its `nbytes` member, which returns the number of bytes that it takes up. * Simplify `ensure_str` in `zarr.meta` If the data is already a `str` instance, turn `ensure_str` into a no-op. For all other cases, make use of Numcodecs' `ensure_bytes` to aid `ensure_str` in coercing data through the buffer protocol. If we are on Python 3, then decode the `bytes` object to a `str`. * Bump to Numcodecs 0.6.2 * Update tutorial's info content As Blosc got upgraded and it contained an upgrade of Zstd, the results changed a little bit for this example. So update them accordingly. Should fix the doctest failure.
2 parents 6214948 + cc1d776 commit c4427a4

File tree

9 files changed

+39
-63
lines changed

9 files changed

+39
-63
lines changed

docs/release.rst

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,11 @@ Enhancements
2222
Maintenance
2323
~~~~~~~~~~~
2424

25+
* The required version of the `numcodecs <http://numcodecs.rtfd.io>`_ package has been upgraded
26+
to 0.6.2, which has enabled some code simplification and fixes a failing test involving
27+
msgpack encoding. By :user:`John Kirkham <jakirkham>`, :issue:`352`, :issue:`355`,
28+
:issue:`324`.
29+
2530
* CI and test environments have been upgraded to include Python 3.7, drop Python 3.4, and
2631
upgrade all pinned package requirements. :issue:`308`.
2732

docs/tutorial.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -178,8 +178,8 @@ print some diagnostics, e.g.::
178178
: blocksize=0)
1 8000 79179
Store type : builtins.dict
180180
No. bytes : 400000000 (381.5M)
181-
No. bytes stored : 3242241 (3.1M)
182-
Storage ratio : 123.4
181+
No. bytes stored : 3379344 (3.2M)
182+
Storage ratio : 118.4
183183
Chunks initialized : 100/100
184184

185185
If you don't specify a compressor, by default Zarr uses the Blosc

requirements_dev.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,3 @@
11
asciitree==0.3.3
22
fasteners==0.14.1
3-
numcodecs==0.5.5
3+
numcodecs==0.6.2

setup.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@
2626
'asciitree',
2727
'numpy>=1.7',
2828
'fasteners',
29-
'numcodecs>=0.5.3',
29+
'numcodecs>=0.6.2',
3030
],
3131
package_dir={'': '.'},
3232
packages=['zarr', 'zarr.tests'],

zarr/core.py

Lines changed: 16 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@
88

99

1010
import numpy as np
11+
from numcodecs.compat import ensure_ndarray
1112

1213

1314
from zarr.util import (is_total_slice, human_readable_size, normalize_resize_args,
@@ -1743,18 +1744,22 @@ def _decode_chunk(self, cdata):
17431744
for f in self._filters[::-1]:
17441745
chunk = f.decode(chunk)
17451746

1746-
# view as correct dtype
1747-
if self._dtype == object:
1748-
if isinstance(chunk, np.ndarray):
1749-
chunk = chunk.astype(self._dtype)
1750-
else:
1751-
raise RuntimeError('cannot read object array without object codec')
1752-
elif isinstance(chunk, np.ndarray):
1747+
# view as numpy array with correct dtype
1748+
chunk = ensure_ndarray(chunk)
1749+
# special case object dtype, because incorrect handling can lead to
1750+
# segfaults and other bad things happening
1751+
if self._dtype != object:
17531752
chunk = chunk.view(self._dtype)
1754-
else:
1755-
chunk = np.frombuffer(chunk, dtype=self._dtype)
1756-
1757-
# reshape
1753+
elif chunk.dtype != object:
1754+
# If we end up here, someone must have hacked around with the filters.
1755+
# We cannot deal with object arrays unless there is an object
1756+
# codec in the filter chain, i.e., a filter that converts from object
1757+
# array to something else during encoding, and converts back to object
1758+
# array during decoding.
1759+
raise RuntimeError('cannot read object array without object codec')
1760+
1761+
# ensure correct chunk shape
1762+
chunk = chunk.reshape(-1, order='A')
17581763
chunk = chunk.reshape(self._chunks, order=self._order)
17591764

17601765
return chunk

zarr/meta.py

Lines changed: 5 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -5,24 +5,20 @@
55

66

77
import numpy as np
8+
from numcodecs.compat import ensure_bytes
89

910

10-
from zarr.compat import PY2, binary_type, Mapping
11+
from zarr.compat import PY2, Mapping
1112
from zarr.errors import MetadataError
1213

1314

1415
ZARR_FORMAT = 2
1516

1617

1718
def ensure_str(s):
18-
if PY2: # pragma: py3 no cover
19-
# noinspection PyUnresolvedReferences
20-
if isinstance(s, buffer): # noqa
21-
s = str(s)
22-
else: # pragma: py2 no cover
23-
if isinstance(s, memoryview):
24-
s = s.tobytes()
25-
if isinstance(s, binary_type):
19+
if not isinstance(s, str):
20+
s = ensure_bytes(s)
21+
if not PY2: # pragma: py2 no cover
2622
s = s.decode('ascii')
2723
return s
2824

zarr/storage.py

Lines changed: 5 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -31,15 +31,13 @@
3131
import warnings
3232

3333

34-
import numpy as np
35-
36-
3734
from zarr.util import (normalize_shape, normalize_chunks, normalize_order,
3835
normalize_storage_path, buffer_size,
3936
normalize_fill_value, nolock, normalize_dtype)
4037
from zarr.meta import encode_array_metadata, encode_group_metadata
41-
from zarr.compat import PY2, binary_type, OrderedDict_move_to_end
38+
from zarr.compat import PY2, OrderedDict_move_to_end
4239
from numcodecs.registry import codec_registry
40+
from numcodecs.compat import ensure_bytes, ensure_contiguous_ndarray
4341
from zarr.errors import (err_contains_group, err_contains_array, err_bad_compressor,
4442
err_fspath_exists_notdir, err_read_only, MetadataError)
4543

@@ -444,23 +442,6 @@ def _init_group_metadata(store, overwrite=False, path=None, chunk_store=None):
444442
store[key] = encode_group_metadata(meta)
445443

446444

447-
def ensure_bytes(s):
448-
if isinstance(s, binary_type):
449-
return s
450-
if isinstance(s, np.ndarray):
451-
if PY2: # pragma: py3 no cover
452-
# noinspection PyArgumentList
453-
return s.tostring(order='A')
454-
else: # pragma: py2 no cover
455-
# noinspection PyArgumentList
456-
return s.tobytes(order='A')
457-
if hasattr(s, 'tobytes'):
458-
return s.tobytes()
459-
if PY2 and hasattr(s, 'tostring'): # pragma: py3 no cover
460-
return s.tostring()
461-
return memoryview(s).tobytes()
462-
463-
464445
def _dict_store_keys(d, prefix='', cls=dict):
465446
for k in d.keys():
466447
v = d[k]
@@ -741,9 +722,8 @@ def __getitem__(self, key):
741722

742723
def __setitem__(self, key, value):
743724

744-
# handle F-contiguous numpy arrays
745-
if isinstance(value, np.ndarray) and value.flags.f_contiguous:
746-
value = ensure_bytes(value)
725+
# coerce to flat, contiguous array (ideally without copying)
726+
value = ensure_contiguous_ndarray(value)
747727

748728
# destination path for key
749729
file_path = os.path.join(self.path, key)
@@ -1192,7 +1172,7 @@ def __getitem__(self, key):
11921172
def __setitem__(self, key, value):
11931173
if self.mode == 'r':
11941174
err_read_only()
1195-
value = ensure_bytes(value)
1175+
value = ensure_contiguous_ndarray(value)
11961176
with self.mutex:
11971177
self.zf.writestr(key, value)
11981178

zarr/tests/test_core.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -982,7 +982,7 @@ def test_object_arrays(self):
982982
z[0] = 'foo'
983983
assert z[0] == 'foo'
984984
z[1] = b'bar'
985-
assert z[1] == 'bar' # msgpack gets this wrong
985+
assert z[1] == b'bar'
986986
z[2] = 1
987987
assert z[2] == 1
988988
z[3] = [2, 4, 6, 'baz']

zarr/util.py

Lines changed: 3 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,5 @@
11
# -*- coding: utf-8 -*-
22
from __future__ import absolute_import, print_function, division
3-
import operator
43
from textwrap import TextWrapper, dedent
54
import numbers
65
import uuid
@@ -10,10 +9,11 @@
109
from asciitree import BoxStyle, LeftAligned
1110
from asciitree.traversal import Traversal
1211
import numpy as np
12+
from numcodecs.compat import ensure_ndarray
1313
from numcodecs.registry import codec_registry
1414

1515

16-
from zarr.compat import PY2, reduce, text_type, binary_type
16+
from zarr.compat import PY2, text_type, binary_type
1717

1818

1919
# codecs to use for object dtype convenience API
@@ -314,17 +314,7 @@ def normalize_storage_path(path):
314314

315315

316316
def buffer_size(v):
317-
from array import array as _stdlib_array
318-
if PY2 and isinstance(v, _stdlib_array): # pragma: py3 no cover
319-
# special case array.array because does not support buffer
320-
# interface in PY2
321-
return v.buffer_info()[1] * v.itemsize
322-
else: # pragma: py2 no cover
323-
v = memoryview(v)
324-
if v.shape:
325-
return reduce(operator.mul, v.shape) * v.itemsize
326-
else:
327-
return v.itemsize
317+
return ensure_ndarray(v).nbytes
328318

329319

330320
def info_text_report(items):

0 commit comments

Comments
 (0)
0