Description
The is_total_slice
function is used to determine whether a selection corresponds to an entire chunk (is_total_slice
-> True
) or not (is_total_slice
-> False
).
I noticed that is_total_slice
is always False
for "partial chunks", i.e. chunks that are not filled by the array:
import zarr
from zarr.util import is_total_slice
# create an array with 1 full chunk and 1 partial chunk
a = zarr.open('test.zarr', path='test', shape=(10,), chunks=(9,), dtype='uint8', mode='w')
for x in BasicIndexer(slice(None), a):
print(x.chunk_selection, is_total_slice(x.chunk_selection, a._chunks))
Which prints this:
(slice(0<
547B
/span>, 9, 1),) True
(slice(0, 1, 1),) False
Although the last selection is not the size of a full chunk, it is "total" with respect to the output of that selection in the array.
A direct consequence of this behavior is unnecessary chunk loading when performing array assignments -- zarr uses the result of is_total_slice
to decide whether to load existing chunk data or not. Because is_total_slice
is always False
for partial chunks, zarr always loads boundary chunks during assignment.
A solution to this would be to augment the is_total_slice
function to account for partial chunks. I'm not sure at the moment how to do this exactly, but it shouldn't be hard. Happy to bring forth a PR if people agree that this is an issue worth addressing.