Open
Description
MRE
# /// script
# requires-python = ">=3.11"
# dependencies = [
# "zarr == 3.0.0",
# ]
# ///
from zarr import create_array
import numpy as np
shapes = (2,1), (2,)
for shape in shapes:
data = np.arange(np.prod(shape) , dtype='uint8').reshape(shape)
arr = create_array(
'data/v2_array_example.zarr',
shape=data.shape,
dtype=data.dtype,
zarr_format=2,
compressors='auto',
overwrite=True,
chunk_key_encoding={'name': 'v2', 'configuration': {'separator': '/'}},
)
arr[:] = data # fill the array with values
print(arr.info_complete())
Traceback (the latter bit):
...
File "/Users/bennettd/Library/Caches/uv/archive-v0/Dt_Vb9Xzyh8CGtlKxZyUO/lib/python3.11/site-packages/zarr/storage/_local.py", line 62, in _put
with path.open(mode=mode) as f:
^^^^^^^^^^^^^^^^^^^^
File "/Users/bennettd/.pyenv/versions/3.11.9/lib/python3.11/pathlib.py", line 1044, in open
return io.open(self, mode, buffering, encoding, errors, newline)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
IsADirectoryError: [Errno 21] Is a directory: 'data/v2_array_example.zarr/0'
This is a simple bug, but it also reveals the meaning of "overwrite" is not clear.
- I would expect creating an array, then creating an array at the same path with
overwrite=True
to completely remove the pre-existing array. - I'm not sure I would expect
overwrite=True
to erase absolutely everything prefixed underneath the array. So if I had savedcat.jpg
in the same directory as my array, I might not expect that to get erased.
It might be reasonable to define "create_x(overwrite=True)" to mean "ensure that all keys necessary to create object x are available, but no other keys will be touched". But issuing a huge volume of delete_object
requests against metered cloud storage could be much more expensive than a simple "delete_prefix". I don't know much about this, someone good at s3 billing should help here.
Curious to hear what other people think about what "overwrite" should mean here.